Running queries on tables with more than 1million rows in

Running queries on tables with more than 1million rows in - php

I am indexing all the columns that I use in my Where / Order by, is there anything else I can do to speed the queries up?
The queries are very simple, like:
SELECT COUNT(*)
FROM TABLE
WHERE user = id
AND other_column = 'something'`
I am using PHP 5, MySQL client version: 4.1.22 and my tables are MyISAM.

Talk to your DBA. Run your local equivalent of showplan. For a query like your sample, I would suspect that a covering index on the columns id and other_column would greatly speed up performance. (I assume user is a variable or niladic function).
A good general rule is the columns in the index should go from left to right in descending order of variance. That is, that column varying most rapidly in value should be the first column in the index and that column varying least rapidly should be the last column in the index. Seems counter intuitive, but there you go. The query optimizer likes narrowing things down as fast as possible.

If all your queries include a user id then you can start with the assumption that userid should be included in each of your indexes, probably as the first field. (Can we assume that the user id is highly selective? i.e. that any single user doesn't have more than several thousand records?)
So your indexes might be:
user + otherfield1
user + otherfield2
etc.
If your user id is really selective, like several dozen records, then just the index on that field should be pretty effective (sub-second return).
What's nice about a "user + otherfield" index is that mysql doesn't even need to look at the data records. The index has a pointer for each record and it can just count the pointers.

Related

The ways to increase "select" from db table with 8.000.000 rows?

Hellow, wrold!
I have read about that in web, but I have not found suitable solutions. I am not pro in sql, and I have the simplest table which contains 10 columns, 8.000.000 rows and the simplest query (i am working with php)
SELECT `field_name_1`, `field_name_2`, `field_name_3`
FROM `table_name`
WHERE `some_field`=#here_is_an_integer#
ORDER BY `some_field`
LIMIT 10;
Maybe some indexes, caching, or something like this.
If you have some minds about that, I'll be glad you help or just say the way I should follow to find the solution.
Thank you!

Use index on some_field and in the best way on all columns where you use SQL WHERE.
If you only want to show data, use pagination on Sql with LIMIT
And like Admin set caching and others Mysql (or MariaDB) limits for better searching like in there (Top 20+ MySQL Best Practices)
... simple answer, if you have space available, from
MySQL> ALTER TABLE table_name ADD INDEX ( some_field );

Here is things to think about:
Make sure all fields that will be joined, WHERE or ORDER / GROUP BY have appropriate indexes (unique or just plain text)
With many rows in a table, the memory cache of the server must be able to store the temporary resultset.
Use InnoDb for slower inserts and faster selects, and tune InnoDb to be able to store the temporary resultset
Do not use RAND() so that a resultset can be query-cached.
Filter early; try to put conditions on JOIN .. AND x=y, instead of on the final WHERE condition (unless of course it is about the main table). (inner) join in the most optimal order; if you have many users and little reports for example, start by selecting the users first, limiting the number of rows immediately before doing other joins.

Perhaps you can formalize your question a bit better (maybe with the use of a question mark somewhere).
Anyways, if you want to increase the speed of a select on a single table such as the one you describe then at the very least the column involved in the WHERE clause should have a proper index. This could be just a standard 'KEY (some_field)' if it is an integer type, otherwise if it is a string (i.e varchar or varbinary) type field with a reasonable amount of cardinality within the first n bytes you can use a prefix index and do something like 'KEY (some_field(8))' and minimize the index size (subsequently increasing performance by decreasing btree search time).
Ideally the server will have enough ram (and have a large enough innodb_buffer_pool_size if you are using innodb) to keep the aforementioned index in memory. Also look into https://dev.mysql.com/doc/refman/5.7/en/innodb-parameters.html#sysvar_innodb_adaptive_hash_index
... if your server has the available resources and your application has the data access patterns to justify it.
Hope this helps!

How to index a query the right way

I am trying to make my DB more optimized and are in the beginning of indexing it but not sure how to do it right.
I have this query:
$year = date("Y");
$thisYear = $year;
//$nextYear = $thisYear + 1;
$sql = mysql_query("SELECT SUM(points) as userpoints
FROM ".$prefix."_publicpoints
WHERE date BETWEEN '$thisYear" . "-01-01' AND '$thisYear" . "-12-31' AND fk_player_id = $playerid");
$row = mysql_fetch_assoc($sql);
$userPoints = $row['userpoints'];
$sql = mysql_query("SELECT
fk_player_id
FROM ".$prefix."_publicpoints
WHERE date BETWEEN '$thisYear" . "-01-01' AND '$thisYear" . "-12-31'
GROUP BY fk_player_id
HAVING SUM(points) > $userPoints");
$row = mysql_fetch_assoc($sql);
$userWrank = mysql_num_rows($sql)+1;
I am not sure how to index this? I have tried indexing the fk_player_id but it still looks through all the rows (287937).
I have indexed the date field which gives me this back in EXPLAIN:
1
SIMPLE
nf_publicpoints
range
IDXdate
IDXdate
3
NULL
143969
Using where with pushed condition; Using temporary...
I also have 2 calls to the same table... Could that be done in one?
How do I index this and/or could it be done smarter?

You should definitely spend some time reading up on indexing, there's a lot written about it, and it's important to understand what's going on.
Broadly speaking, and index imposes an ordering on the rows of a table.
For simplicity's sake, imagine a table is just a big CSV file. Whenever a row is inserted, it's inserted at the end. So the "natural" ordering of the table is just the order in which rows were inserted.
Imagine you've got that CSV file loaded up in a very rudimentary spreadsheet application. All this spreadsheet does is display the data, and numbers the rows in sequential order.
Now imagine that you need to find all the rows that has some value "M" in the third column. Given what you have available, you have only one option. You scan the table checking the value of the third column for each row. If you've got a lot of rows, this method (a "table scan") can take a long time!
Now imagine that in addition to this table, you've got an index. This particular index is the index of values in the third column. The index lists all of the values from the third column, in some meaningful order (say, alphabetically) and for each of them, provides a list of row numbers where that value appears.
Now you have a good strategy for finding all the rows where the value of the third column is M! For instance, you can perform a binary search! Whereas the table scan requires you to look N rows (where N is the number of rows), the binary search only requires that you look at log-n index entries, in the very worst case. Wow, that's sure a lot easier!
Of course, if you have this index, and you're adding rows to the table (at the end, since that's how our conceptual table works), you need need to update the index each and every time. So you do a little more work while you're writing new rows, but you save a ton of time when you're searching for something.
So, in general, indexing creates a tradeoff between read efficiency and write efficiency. With no indexes, inserts can be very fast -- the database engine just adds a row to the table. As you add indexes, the engine must update each index while performing the insert.
On the other hand, reads become a lot faster.
Hopefully that covers your first two questions (as others have answered -- you need to find the right balance).
Your third scenario is a little more complicated. If you're using LIKE, indexing engines will typically help with your read speed up to the first "%". In other words, if you're SELECTing WHERE column LIKE 'foo%bar%', the database will use the index to find all the rows where column starts with "foo", and then need to scan that intermediate rowset to find the subset that contains "bar". SELECT ... WHERE column LIKE '%bar%' can't use the index. I hope you can see why.
Finally, you need to start thinking about indexes on more than one column. The concept is the same, and behaves similarly to the LIKE stuff -- essentialy, if you have an index on (a,b,c), the engine will continue using the index from left to right as best it can. So a search on column a might use the (a,b,c) index, as would one on (a,b). However, the engine would need to do a full table scan if you were searching WHERE b=5 AND c=1)
Hopefully this helps shed a little light, but I must reiterate that you're best off spending a few hours digging around for good articles that explain these things in depth. It's also a good idea to read your particular database server's documentation. The way indices are implemented and used by query planners can vary pretty widely.
More information and example visit here : http://blog.sqlauthority.com/category/sql-index/

Try create index on date column, indexing fk_payer_id will not help with this query. If does not work - paste explain...
For more information about indexes in Mysql look here: http://hackmysql.com/case1

Why not index the date column, seeing how that's the main criterion that will be evaluated in the lookup?

Should one use/create as many indices as possible in MySQL?

I realized, that the response to a MySQL query becomes much faster, when creating an index for the column you use for "ORDER BY", e.g.
SELECT username FROM table ORDER BY registration_date DESC
Now I'm wondering which indices I should create to optimize the request time.
For example I frequently use the following queries:
SELECT username FROM table WHERE
registration_date > ".(time() - 10000)."
SELECT username FROM table WHERE
registration_date > ".(time() - 10000)."
&& status='active'
SELECT username FROM table WHERE
status='active'
SELECT username FROM table ORDER BY registration_date DESC
SELECT username FROM table WHERE
registration_date > ".(time() - 10000)."
&& status='active'
ORDER BY birth_date DESC
Question 1:
Should I set up separate indices for the first three request types? (i.e. one index for the column "registration_date", one index for the column "status", and another column for the combination of both?)
Question 2:
Are different indices independently used for "WHERE" and for "ORDER BY"? Say, I have a combined index for the columns "status" and "registration_date", and another index only for the column "birth_date". Should I setup another combined index for the three columns ("status", "registration_date" and "birth_date")?

There are no hard-and-fast rules for indices or query optimization. Each case needs to be considered and examined.
Generally speaking, however, you can and should add indices to columns that you frequently sort by or use in WHERE statements. (Answer to Question 2 -- No, the same indices are potentially used for ORDER BY and WHERE) Whether to do a multi-column index or a single-column one depends on the frequency of queries. Also, you should note that single-column indices may be combined by mySQL using the Index Merge Optimization:
The Index Merge method is used to retrieve rows with several range
scans and to merge their results into one. The merge can produce
unions, intersections, or unions-of-intersections of its underlying
scans. This access method merges index scans from a single table; it
does not merge scans across multiple tables.
(more reading: http://dev.mysql.com/doc/refman/5.0/en/index-merge-optimization.html)
Multi-column indices also require that you take care to structure your queries in such a way that your use of indexed columns matches the column order in the index:
MySQL cannot use an index if the columns do not form a leftmost
prefix of the index. Suppose that you have the SELECT statements shown
here:
SELECT * FROM tbl_name WHERE col1=val1; SELECT * FROM tbl_name WHERE
col1=val1 AND col2=val2;
SELECT * FROM tbl_name WHERE col2=val2; SELECT * FROM tbl_name WHERE
col2=val2 AND col3=val3;
If an index exists on (col1, col2, col3), only the first two queries
use the index. The third and fourth queries do involve indexed
columns, but (col2) and (col2, col3) are not leftmost prefixes of
(col1, col2, col3).
Bear in mind that indices DO have a performance consideration of their own -- it is possible to "over-index" a table. Each time a record is inserted or an indexed column is modified, the index/indices will have to be rebuilt. This does demand resources, and depending on the size and structure of your table, it may cause a decrease in responsiveness while the index building operations are active.
Use EXPLAIN to find out exactly what is happening in your queries. Analyze, experiment, and don't over-do it. The shotgun approach is not appropriate for database optimization.
Documentation
MySQL EXPLAIN - http://dev.mysql.com/doc/refman/5.0/en/explain.html
How MySQL uses indices - http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html
Index Merge Optimization - http://dev.mysql.com/doc/refman/5.0/en/index-merge-optimization.html

To quote this page:
[Indices] will slow down your updates and inserts.
That's the tradeoff you have to calculate. To optimize your table, you should put indices only in the columns you are most likely to apply conditions to - the more indices you have, the slower your data-changing operations become. In that sense, I personally don't see much merit in creating combined indices - if you create all 7 possible permutations of indices for 3 columns, you are most definitely putting more drag on your updates and inserts than just using 3 indices for 3 columns (and even that can be debatable). On the other hand, if the data is being edited much, much less than it is being SELECTed, then indices can really help you speed things up.
Something else to take into consideration (again quoting the above page):
If your table is very small [...] it's worse to use an index than to leave it out and just let it do a table scan. Indexes really only come in handy with tables that have a lot of rows.

Yes, it is a good idea to have indexes on your column you often use, both for order by and in your where clauses.
But be aware: UPDATES, INSERTS and DELETE slow down if you have indexes.
That is because after such an operation, the index must be updated too.
So, as a rule-of-thumb: If your application is read-intensive, use the indexes where you think they help.
If your application is often updating the data, be careful, because that may get slow because of the indexes.
When in doubt, you must simply get dirty hands, and study the results of EXPLAIN.
http://dev.mysql.com/doc/refman/5.6/en/explain.html

As for the first two examples, you can satisfy them with one index: {registration_date, status}. Such an index can support filters on the first item (registration_date) or on both.
It does not work for status alone, however. The question on status is how selective is the status. That is, what proportion of records have status = "active". If this is a high proportion (so, on average, every database page would have an active record), then an index may not help very much.
The order by's are trickier. I don't know if mysql uses indexes for this purpose. Often, using an index for sorting entire records is less efficient than just sorting the records. Using the index causes a random access pattern to the records in the pages, which can cause major performance problems for tables larger than the page cache.

Use the explain function on your select statements to determine where your joins are slowing down (the more rows that are referenced, the slower it will be). Then apply your indices to those columns.
EXPLAIN SELECT * FROM table JOIN table 2 ON a = b WHERE conditions;

Optimizing an MYSQL COUNT ORDER BY query

I have recently written a survey application that has done it's job and all the data is gathered. Now i have to analyze the data and i'm having some time issues.
I have to find out how many people selected what option and display it all.
I'm using this query, which does do it's job:
SELECT COUNT(*)
FROM survey
WHERE users = ? AND table = ? AND col = ? AND row = ? AND selected = ?
GROUP BY users,table,col,row,selected
As evident by the "?" i'm using MySQLi (in php) to fetch the data when needed, but i fear this is causing it to be so slow.
The table consists of all the elements above (+ an unique ID) and all of them are integers.
To explain some of the fields:
Each survey was divided into 3 or 4 tables (sized from 2x3 to 5x5) with a 1 to 10 happiness grade to select form. (questions are on the right and top of the table, then you answer where the questions intersect)
users - age groups
table, row, col - explained above
selected - dooooh explained above
Now with the surveys complete and around 1 million entries in the table the query is getting very slow. Sometimes it takes like 3 minutes, sometimes (i guess) the time limit expires and you get no data at all. I also don't have access to the full database, just my empty "testing" one since the costumer is kinda paranoid :S (and his server seems to be a bit slow)
Now (after the initial essay) my questions are: I left indexing out intentionally because with a lot of data being written during the survey, it would be a bad idea. But since no new data is coming in at this point, would it make sense to index all the fields of a table? How much sense does it make to index integers that never go above 10? (as you can guess i haven't got a clue about indexes). Do i need the primary unique ID in this table? I
I read somewhere that indexing may help groups but only if you group by the first columns in a table (and since my ID is first and from my point of view useless can i remove it and gain anything by it?)
Is there another way to write my query that would basically do the same thing but in a shorter period of time?
Thanks for all your suggestions in advance!

Add an index on entries that you "GROUP BY" or do "WHERE". So that's ONE index incorporating users,table,col,row and selected in your case.
Some quick rules:
combine fields to have the WHERE first, and the GROUP BY elements last.
If you have other queries that only use part of it (e.g. users,table,col and selected) then leave the missing value (row, in this example) last.
Don't use too many indexes/indeces, as each will slow the table to updates marginally - so on really large system you need to balance queries with indexes.
Edit: do you need the GROUP BY user,col,row as these are used in the WHERE. If the WHERE has already filtered them out, you only need group by "selected".

How can I optimize this simple database and query using php and mysql?

I pull a range (e.g. limit 72, 24) of games from a database according to which have been voted most popular. I have a separate table for tracking game data, and one for tracking individual votes for a game (rating from 1 to 5, one vote per user per game). A game is considered "most popular" or "more popular" when that game has the highest average rating of all the rating votes for said game. Games with less than 5 votes are not considered. Here is what the tables look like (two tables, "games" and "votes"):
games:
gameid(key)
gamename
thumburl
votes:
userid(key)
gameid(key)
rating
Now, I understand that there is something called an "index" which can speed up my queries by essentially pre-querying my tables and constructing a separate table of indices (I don't really know.. that's just my impression).
I've also read that mysql operates fastest when multiple queries can be condensed into one longer query (containing joins and nested select statements, I presume).
However, I am currently NOT using an index, and I am making multiple queries to get my final result.
What changes should be made to my database (if any -- including constructing index tables, etc.)? And what should my query look like?
Thank you.

Your query that calculates the average for every game could look like:
SELECT gamename, AVG(rating)
FROM games INNER JOIN votes ON games.gameid = votes.gameid
GROUP BY games.gameid
HAVING COUNT(*)>=5
ORDER BY avg(rating) DESC
LIMIT 0,25
You must have an index on gameid on both games and votes. (if you have defined gameid as a primary key on table games that is ok)

According to the MySQL documentation, an index is created when you designate a primary key at table creation. This is worth mentioning, because not all RDBMS's function this way.
I think you have the right idea here, with your "votes" table acting as a bridge between "games" and "user" to handle the many-to-many relationship. Just make sure that "userid" and "gameid" are indexed on the "votes" table.

If you have access to use InnoDB storage for your tables, you can create foreign keys on gameid in the votes table which will use the index created for your primary key in the games table. When you then perform a query which joins these two tables (e.g. ... INNER JOIN votes ON games.gameid = votes.gameid) it will use that index to speed things up.
Your understanding of an index is essentially correct — it basically creates a separate lookup table which it can use behind the scenes when the query is executed.
When using an index it is useful to use the EXPLAIN syntax (simply prepend your SELECT with EXPLAIN to try this out). The output it gives show you the list of possible keys available for the query as well as which key the query is using. This can be very helpful when optimising your query.

An index is a PHYSICAL DATA STRUCTURE which is used to help speed up retrieval type queries; it's not simply a table upon a table -> good for a concept though. Another concept is the way indexes work at the back of your text book (the only difference is with your book a search key could point to multiple pages / matches whereas with indexes a search key points to only one page/match). An index is defined by data structures so you could use a B+ tree index and there are even hash indexes. It's Database/Query optimization from the physical/internal level of the Database - I'm assuming that you know that you're working at the higher levels of the DBMS which is easier. An index is rooted within the internal levels and that make DB query optimization much more effective and interesting.
I've noticed from your question that you have not even developed the query as yet. Focus on the query first. Indexing comes after, as a matter of a fact, in any graduate or post graduate Database course, indexing falls under the maintenance of a Database and not necessarily the development.
Also N.B. I have seen quite many people say as a rule to make all primary keys indexes. This is not true. There are many instances where a primary key index would slow up the Database. Infact, if we were to go with only primary indexes then should use hash indexes since they work better than B+ trees!
In summary, it doesn't make sense to ask a question for a query and an index. Ask for help with the query first. Then given your tables (relational schema) and SQL query, then and only then could I advice you on the best index - remember its maintenance. We can't do maintanance if there is 0 development.
Kind Regards,
N.B. most questions concerning indexes at the post graduate level of many computing courses are as follows: we give the students a relational schema (i.e. your tables) and a query and then ask: critically suggest a suitable index for the following query on the tables ----> we can't ask a question like this if they dont have a query

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.