Optimize a rankings page using PHP and MySQL

Optimize a rankings page using PHP and MySQL - php

I could really use some help optimizing a table on my website that is used to display rankings. I have been reading a lot on how to optimize queries and how to properly use indexes but even after implementing changes I thought would work, little improvement can be seen. My quick fix has been simply to use only the top 100,000 rankings (updated daily and stored in a different table) to improve the speed for now, but I really don't like that option.
So I have a table that stores the information for users that looks something like:
table 'cache':
id (Primary key)
name
region
country
score
There are other variables being stored about the user, but I don't think they are relevant here as they are not used in the rankings.
There are 3 basic ranking pages that a user can view:
A world view:
SELECT cache name,region,country,score FROM cache ORDER BY score DESC LIMIT 0,26
A region view:
SELECT name,region,country,score FROM cache WHERE region='Europe' ORDER BY score DESC LIMIT 0,26
and a country view:
SELECT name,region,country,score FROM cache WHERE region='Europe' AND country='Germany' ORDER BY score DESC LIMIT 0,26
I have tried almost every combination of indexes I can think of to help alleviate work for the database, and while some seem to help a little bit I can't find one that will only return 26 rows for both the region and country queries(with simply an index on 'score' the world rankings are blazing fast).
I feel like I might be missing something basic, any help would be much appreciated!
Little extra info: the cache table is currently around 920 megabytes with a little more than 800,000 rows total. If you could use any more info just let me know.

Your world rankings benefit from the score index because score is the only criteria in the query. The logical sequence it sorts on is built into the query. So that's good.
The other queries will benefit from an index on region. However, similar to what #Matt indicates, a composite index on region, country and score may be the best bet. Note, the three columns for the key should be in region, country, score sequence.

put ONE index on country, score, region. Too many indices will slow you down.

How many records are we talking about?
I am no SQL guru, but in this case, if just changes to indexes did not do the trick. I would consider playing around with the table structure to see what perf gains I could get.
Cache
id (pk)
LocationId (index)
name
score
Location
LocationId (pk)
CountryId (index) maybe
RegionId (index) maybe
Country
CountryId
name
Region
RegionId
name
Location
LocationId (Primary key)
CountryId
RegionId
Country
CountryId
name
Region
RegionId
name
Temp tables in Procs would allow you to select on Location Id in every case. It would reduce the over all complexity of the issue you are having: you would be troubleshooting 1 query plan, not 3.
The effort would be high, and the payoff would be until you were done, so I would suggest looking at the index approach first.
Good luck

Related

How can I optimize this simple database and query using php and mysql?

I pull a range (e.g. limit 72, 24) of games from a database according to which have been voted most popular. I have a separate table for tracking game data, and one for tracking individual votes for a game (rating from 1 to 5, one vote per user per game). A game is considered "most popular" or "more popular" when that game has the highest average rating of all the rating votes for said game. Games with less than 5 votes are not considered. Here is what the tables look like (two tables, "games" and "votes"):
games:
gameid(key)
gamename
thumburl
votes:
userid(key)
gameid(key)
rating
Now, I understand that there is something called an "index" which can speed up my queries by essentially pre-querying my tables and constructing a separate table of indices (I don't really know.. that's just my impression).
I've also read that mysql operates fastest when multiple queries can be condensed into one longer query (containing joins and nested select statements, I presume).
However, I am currently NOT using an index, and I am making multiple queries to get my final result.
What changes should be made to my database (if any -- including constructing index tables, etc.)? And what should my query look like?
Thank you.

Your query that calculates the average for every game could look like:
SELECT gamename, AVG(rating)
FROM games INNER JOIN votes ON games.gameid = votes.gameid
GROUP BY games.gameid
HAVING COUNT(*)>=5
ORDER BY avg(rating) DESC
LIMIT 0,25
You must have an index on gameid on both games and votes. (if you have defined gameid as a primary key on table games that is ok)

According to the MySQL documentation, an index is created when you designate a primary key at table creation. This is worth mentioning, because not all RDBMS's function this way.
I think you have the right idea here, with your "votes" table acting as a bridge between "games" and "user" to handle the many-to-many relationship. Just make sure that "userid" and "gameid" are indexed on the "votes" table.

If you have access to use InnoDB storage for your tables, you can create foreign keys on gameid in the votes table which will use the index created for your primary key in the games table. When you then perform a query which joins these two tables (e.g. ... INNER JOIN votes ON games.gameid = votes.gameid) it will use that index to speed things up.
Your understanding of an index is essentially correct — it basically creates a separate lookup table which it can use behind the scenes when the query is executed.
When using an index it is useful to use the EXPLAIN syntax (simply prepend your SELECT with EXPLAIN to try this out). The output it gives show you the list of possible keys available for the query as well as which key the query is using. This can be very helpful when optimising your query.

An index is a PHYSICAL DATA STRUCTURE which is used to help speed up retrieval type queries; it's not simply a table upon a table -> good for a concept though. Another concept is the way indexes work at the back of your text book (the only difference is with your book a search key could point to multiple pages / matches whereas with indexes a search key points to only one page/match). An index is defined by data structures so you could use a B+ tree index and there are even hash indexes. It's Database/Query optimization from the physical/internal level of the Database - I'm assuming that you know that you're working at the higher levels of the DBMS which is easier. An index is rooted within the internal levels and that make DB query optimization much more effective and interesting.
I've noticed from your question that you have not even developed the query as yet. Focus on the query first. Indexing comes after, as a matter of a fact, in any graduate or post graduate Database course, indexing falls under the maintenance of a Database and not necessarily the development.
Also N.B. I have seen quite many people say as a rule to make all primary keys indexes. This is not true. There are many instances where a primary key index would slow up the Database. Infact, if we were to go with only primary indexes then should use hash indexes since they work better than B+ trees!
In summary, it doesn't make sense to ask a question for a query and an index. Ask for help with the query first. Then given your tables (relational schema) and SQL query, then and only then could I advice you on the best index - remember its maintenance. We can't do maintanance if there is 0 development.
Kind Regards,
N.B. most questions concerning indexes at the post graduate level of many computing courses are as follows: we give the students a relational schema (i.e. your tables) and a query and then ask: critically suggest a suitable index for the following query on the tables ----> we can't ask a question like this if they dont have a query

GROUP BY and ORDER BY too slow. How to make faster?

I've trying to create some stats for my table but it has over 3 million rows so it is really slow.
I'm trying to find the most popular value for column name and also showing how many times it pops up.
I'm using this at the momment but it doesn't work cause its too slow and I just get errors.
$total = mysql_query("SELECT `name`, COUNT(*) as b FROM `people` GROUP BY `name` ORDER BY `b` DESC LIMIT 0,5;")or die(mysql_error());
As you may see I'm trying to get all the names and how many times that name has been used but only show the top 5 to hopefully speed it up.
I would like to be able to then do get the values like
while($row = mysql_fetch_array($result)){
echo $row['name'].': '.$row['b']."\r\n";
}
And it will show things like this;
Bob: 215
Steve: 120
Sophie: 118
RandomGuy: 50
RandomGirl: 50
I don't care much about ordering the names afterwards like RandomGirl and RandomGuy been the wrong way round.
I think I've have provided enough information. :) I would like the names to be case-insensitive if possible though. Bob should be the same as BoB, bOb, BOB and so on.
Thank-you for your time
Paul

Limiting results on the top 5 won't give you a lot of speed-up, you'll gain time in the result retrieval, but in mySQL side the whole table still needs to be parsed (to count).
You will speed-up your count query having index on name column, of course as only the index will be parsed and not the table.
Now if you really want to speed up the result and avoid parsing the name index when you need this result (which will still be quite slow if you really have millions of rows), then the only other solution is computing the stats when inserting, deleting or updating rows on this table. That is using triggers on this table to maintain a statistics table near this one. Then you will really only have a simple select query on this statistics table, with only 5 rows parsed. But you will slow down your inserts, delete and update operations (which are already quite slow, especially if you maintain indexes, so if the stats are important you should study this solution).

Do you have an index on name? It might help.

Since you are doing the counting/grouping and then sorting an index on name doesn't help at all MySql should go through all rows every time, there is no way to optimize this. You need to have a separate stats table like this:
CREATE TABLE name_stats( name VARCHAR(n), cnt INT, UNIQUE( name ), INDEX( cnt ) )
and you should update this table whenever you add a new row to 'people' table like this:
INSERT INTO name_stats VALUES( 'Bob', 1 ) ON DUPLICATE KEY UPDATE cnt = cnt + 1;
Querying this table for the list of top names should give you the results instantaneously.

MySQL table with 4,000,000 record?

The website I have to manage is a search engine for worker (yellow page style)
I have created a database like this:
People: <---- 4,000,000 records
id
name
address
id_activity <--- linked to the activites table
tel
fax
id_region <--- linked to the regions table
activites: <---- 1500 activites
id
name_activity
regions: <--- 95 regions
id
region_name
locations: <---- 4,000,000 records
id_people
lat
lon
So basically the request that I am having slow problem with is to select all the "workers" around a selecty city (select by the user)
The request I have created is fully working but takes 5-6 seconds to return results...
Basically I do a select on the table locations to select all the city in a certain radius and then join to the people table
SELECT people.*,id, lat, lng, poi,
(6371 * acos(cos(radians(plat)) * cos(radians(lat)) * cos(radians(lng) - radians(plon)) + sin(radians(plat)) * sin(radians(lat)))) AS distance
FROM locations,
people
WHERE locations.id = people.id
HAVING distance < dist
ORDER BY distance LIMIT 0 , 20;
My questions are:
Is my Database nicely designed? I don't know if it's a good idea to have 2 table with 4,000,000 records each. Is it OK to do a select on it?
Is my request badly designed?
How can I speed up the search?

The design looks normalized. This is what I would expect to see in most well designed databases. The amount of data in the tables is important, but secondary. However if there is a 1-to-1 correlation between People and Locations, as appears from your query, I would say the tables should be one table. This will certainly help.
Your SQL looks OK, though adding constraints to reduce the number of rows involved would help.
You need to index your tables. This is what will normally help most with slowness (as most developers don't consider database indexes at all).

There are a couple of basic things that could be making your query run slowly.
What are your indexes like on your tables? Have you declared primary keys on the tables? Joining two tables each with 4M rows without having indexes causes a lot of work on the DB. Make sure you get this right first.
If you've already built the right indexes for your DB you can look at caching data. You're doing a calculation in your query Are the locations (lat/lon) generally fixed? How often do they change? Are the items in your locations table actual places (cities, buildings, etc), or are they records of where the people have been (like Foursquare checkins)?
If your locations are places you can make a lot of nice optimizations if you isolate the parts of your data that change infrequently and pre-calculate the distances between them.
If all else fails, make sure your database server has enough RAM. If the server can keep your data in memory it will speed things up a lot.

Running queries on tables with more than 1million rows in

I am indexing all the columns that I use in my Where / Order by, is there anything else I can do to speed the queries up?
The queries are very simple, like:
SELECT COUNT(*)
FROM TABLE
WHERE user = id
AND other_column = 'something'`
I am using PHP 5, MySQL client version: 4.1.22 and my tables are MyISAM.

Talk to your DBA. Run your local equivalent of showplan. For a query like your sample, I would suspect that a covering index on the columns id and other_column would greatly speed up performance. (I assume user is a variable or niladic function).
A good general rule is the columns in the index should go from left to right in descending order of variance. That is, that column varying most rapidly in value should be the first column in the index and that column varying least rapidly should be the last column in the index. Seems counter intuitive, but there you go. The query optimizer likes narrowing things down as fast as possible.

If all your queries include a user id then you can start with the assumption that userid should be included in each of your indexes, probably as the first field. (Can we assume that the user id is highly selective? i.e. that any single user doesn't have more than several thousand records?)
So your indexes might be:
user + otherfield1
user + otherfield2
etc.
If your user id is really selective, like several dozen records, then just the index on that field should be pretty effective (sub-second return).
What's nice about a "user + otherfield" index is that mysql doesn't even need to look at the data records. The index has a pointer for each record and it can just count the pointers.

CREATE VIEW for MYSQL for last 30 days

I know i am writing query's wrong and when we get a lot of traffic, our database gets hit HARD and the page slows to a grind...
I think I need to write queries based on CREATE VIEW from the last 30 days from the CURDATE ?? But not sure where to begin or if this will be MORE efficient query for the database?
Anyways, here is a sample query I have written..
$query_Recordset6 = "SELECT `date`, title, category, url, comments
FROM cute_news
WHERE category LIKE '%45%'
ORDER BY `date` DESC";
Any help or suggestions would be great! I have about 11 queries like this, but I am confident if I could get help on one of these, then I can implement them to the rest!!

Putting a wildcard on the left side of a value comparison:
LIKE '%xyz'
...means that an index can not be used, even if one exists. Might want to consider using Full Text Searching (FTS), which means adding full text indexing.
Normalizing the data would be another step to consider - categories should likely be in a separate table.

SELECT `date`, title, category, url, comments
FROM cute_news
WHERE category LIKE '%45%'
ORDER BY `date` DESC
The LIKE '%45%' means a full table scan will need to be performed. Are you perhaps storing a list of categories in the column? If so creating a new table storing category and news_article_id will allow an index to be used to retrieve the matching records much more efficiently.

OK, time for psychic debugging.
In my mind's eye, I see that query performance would be improved considerably through database normalization, specifically by splitting the category multi-valued column into a a separate table that has two columns: the primary key for cute_news and the category ID.
This would also allow you to directly link said table to the categories table without having to parse it first.
Or, as Chris Date said: "Every row-and-column intersection contains exactly one value from the applicable domain (and nothing else)."

Anything with LIKE '%XXX%' is going to be slow. Its a slow operation.
For something like categories, you might want to separate categories out into another table and use a foreign key in the cute_news table. That way you can have category_id, and use that in the query which will be MUCH faster.
Also, I'm not quite sure why you're talking about using CREATE VIEW. Views will not really help you for speed. Not unless its a materialized view, which MySQL doesn't suppose natively.

If your database is getting hit hard, the solution isn't to make a view (the view is still basically the same amount of work for the database to do), the solution is to cache the results.
This is especially applicable since, from what it sounds like, your data only needs to be refreshed once every 30 days.

I'd guess that your category column is a list of category values like "12,34,45,78" ?
This is not good relational database design. One reason it's not good is as you've discovered: it's incredibly slow to search for a substring that might appear in the middle of that list.
Some people have suggested using fulltext search instead of the LIKE predicate with wildcards, but in this case it's simpler to create another table so you can list one category value per row, with a reference back to your cute_news table:
CREATE TABLE cute_news_category (
news_id INT NOT NULL,
category INT NOT NULL,
PRIMARY KEY (news_id, category),
FOREIGN KEY (news_id) REFERENCES cute_news(news_id)
) ENGINE=InnoDB;
Then you can query and it'll go a lot faster:
SELECT n.`date`, n.title, c.category, n.url, n.comments
FROM cute_news n
JOIN cute_news_category c ON (n.news_id = c.news_id)
WHERE c.category = 45
ORDER BY n.`date` DESC

Any answer is a guess, show:
- the relevant SHOW CREATE TABLE outputs
- the EXPLAIN output from your common queries.
And Bill Karwin's comment certainly applies.
After all this & optimizing, sampling the data into a table with only the last 30 days could still be desired, in which case you're better of running a daily cronjob to do just that.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.