Should one use/create as many indices as possible in MySQL?

Should one use/create as many indices as possible in MySQL? - php

I realized, that the response to a MySQL query becomes much faster, when creating an index for the column you use for "ORDER BY", e.g.
SELECT username FROM table ORDER BY registration_date DESC
Now I'm wondering which indices I should create to optimize the request time.
For example I frequently use the following queries:
SELECT username FROM table WHERE
registration_date > ".(time() - 10000)."
SELECT username FROM table WHERE
registration_date > ".(time() - 10000)."
&& status='active'
SELECT username FROM table WHERE
status='active'
SELECT username FROM table ORDER BY registration_date DESC
SELECT username FROM table WHERE
registration_date > ".(time() - 10000)."
&& status='active'
ORDER BY birth_date DESC
Question 1:
Should I set up separate indices for the first three request types? (i.e. one index for the column "registration_date", one index for the column "status", and another column for the combination of both?)
Question 2:
Are different indices independently used for "WHERE" and for "ORDER BY"? Say, I have a combined index for the columns "status" and "registration_date", and another index only for the column "birth_date". Should I setup another combined index for the three columns ("status", "registration_date" and "birth_date")?

There are no hard-and-fast rules for indices or query optimization. Each case needs to be considered and examined.
Generally speaking, however, you can and should add indices to columns that you frequently sort by or use in WHERE statements. (Answer to Question 2 -- No, the same indices are potentially used for ORDER BY and WHERE) Whether to do a multi-column index or a single-column one depends on the frequency of queries. Also, you should note that single-column indices may be combined by mySQL using the Index Merge Optimization:
The Index Merge method is used to retrieve rows with several range
scans and to merge their results into one. The merge can produce
unions, intersections, or unions-of-intersections of its underlying
scans. This access method merges index scans from a single table; it
does not merge scans across multiple tables.
(more reading: http://dev.mysql.com/doc/refman/5.0/en/index-merge-optimization.html)
Multi-column indices also require that you take care to structure your queries in such a way that your use of indexed columns matches the column order in the index:
MySQL cannot use an index if the columns do not form a leftmost
prefix of the index. Suppose that you have the SELECT statements shown
here:
SELECT * FROM tbl_name WHERE col1=val1; SELECT * FROM tbl_name WHERE
col1=val1 AND col2=val2;
SELECT * FROM tbl_name WHERE col2=val2; SELECT * FROM tbl_name WHERE
col2=val2 AND col3=val3;
If an index exists on (col1, col2, col3), only the first two queries
use the index. The third and fourth queries do involve indexed
columns, but (col2) and (col2, col3) are not leftmost prefixes of
(col1, col2, col3).
Bear in mind that indices DO have a performance consideration of their own -- it is possible to "over-index" a table. Each time a record is inserted or an indexed column is modified, the index/indices will have to be rebuilt. This does demand resources, and depending on the size and structure of your table, it may cause a decrease in responsiveness while the index building operations are active.
Use EXPLAIN to find out exactly what is happening in your queries. Analyze, experiment, and don't over-do it. The shotgun approach is not appropriate for database optimization.
Documentation
MySQL EXPLAIN - http://dev.mysql.com/doc/refman/5.0/en/explain.html
How MySQL uses indices - http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html
Index Merge Optimization - http://dev.mysql.com/doc/refman/5.0/en/index-merge-optimization.html

To quote this page:
[Indices] will slow down your updates and inserts.
That's the tradeoff you have to calculate. To optimize your table, you should put indices only in the columns you are most likely to apply conditions to - the more indices you have, the slower your data-changing operations become. In that sense, I personally don't see much merit in creating combined indices - if you create all 7 possible permutations of indices for 3 columns, you are most definitely putting more drag on your updates and inserts than just using 3 indices for 3 columns (and even that can be debatable). On the other hand, if the data is being edited much, much less than it is being SELECTed, then indices can really help you speed things up.
Something else to take into consideration (again quoting the above page):
If your table is very small [...] it's worse to use an index than to leave it out and just let it do a table scan. Indexes really only come in handy with tables that have a lot of rows.

Yes, it is a good idea to have indexes on your column you often use, both for order by and in your where clauses.
But be aware: UPDATES, INSERTS and DELETE slow down if you have indexes.
That is because after such an operation, the index must be updated too.
So, as a rule-of-thumb: If your application is read-intensive, use the indexes where you think they help.
If your application is often updating the data, be careful, because that may get slow because of the indexes.
When in doubt, you must simply get dirty hands, and study the results of EXPLAIN.
http://dev.mysql.com/doc/refman/5.6/en/explain.html

As for the first two examples, you can satisfy them with one index: {registration_date, status}. Such an index can support filters on the first item (registration_date) or on both.
It does not work for status alone, however. The question on status is how selective is the status. That is, what proportion of records have status = "active". If this is a high proportion (so, on average, every database page would have an active record), then an index may not help very much.
The order by's are trickier. I don't know if mysql uses indexes for this purpose. Often, using an index for sorting entire records is less efficient than just sorting the records. Using the index causes a random access pattern to the records in the pages, which can cause major performance problems for tables larger than the page cache.

Use the explain function on your select statements to determine where your joins are slowing down (the more rows that are referenced, the slower it will be). Then apply your indices to those columns.
EXPLAIN SELECT * FROM table JOIN table 2 ON a = b WHERE conditions;

Related

The ways to increase "select" from db table with 8.000.000 rows?

Hellow, wrold!
I have read about that in web, but I have not found suitable solutions. I am not pro in sql, and I have the simplest table which contains 10 columns, 8.000.000 rows and the simplest query (i am working with php)
SELECT `field_name_1`, `field_name_2`, `field_name_3`
FROM `table_name`
WHERE `some_field`=#here_is_an_integer#
ORDER BY `some_field`
LIMIT 10;
Maybe some indexes, caching, or something like this.
If you have some minds about that, I'll be glad you help or just say the way I should follow to find the solution.
Thank you!

Use index on some_field and in the best way on all columns where you use SQL WHERE.
If you only want to show data, use pagination on Sql with LIMIT
And like Admin set caching and others Mysql (or MariaDB) limits for better searching like in there (Top 20+ MySQL Best Practices)
... simple answer, if you have space available, from
MySQL> ALTER TABLE table_name ADD INDEX ( some_field );

Here is things to think about:
Make sure all fields that will be joined, WHERE or ORDER / GROUP BY have appropriate indexes (unique or just plain text)
With many rows in a table, the memory cache of the server must be able to store the temporary resultset.
Use InnoDb for slower inserts and faster selects, and tune InnoDb to be able to store the temporary resultset
Do not use RAND() so that a resultset can be query-cached.
Filter early; try to put conditions on JOIN .. AND x=y, instead of on the final WHERE condition (unless of course it is about the main table). (inner) join in the most optimal order; if you have many users and little reports for example, start by selecting the users first, limiting the number of rows immediately before doing other joins.

Perhaps you can formalize your question a bit better (maybe with the use of a question mark somewhere).
Anyways, if you want to increase the speed of a select on a single table such as the one you describe then at the very least the column involved in the WHERE clause should have a proper index. This could be just a standard 'KEY (some_field)' if it is an integer type, otherwise if it is a string (i.e varchar or varbinary) type field with a reasonable amount of cardinality within the first n bytes you can use a prefix index and do something like 'KEY (some_field(8))' and minimize the index size (subsequently increasing performance by decreasing btree search time).
Ideally the server will have enough ram (and have a large enough innodb_buffer_pool_size if you are using innodb) to keep the aforementioned index in memory. Also look into https://dev.mysql.com/doc/refman/5.7/en/innodb-parameters.html#sysvar_innodb_adaptive_hash_index
... if your server has the available resources and your application has the data access patterns to justify it.
Hope this helps!

Multiple sql statements or loops and conditions

I have two tables employee and attendance.
employee : empID, empName
attendance: attendanceID, empID, date, inTime, outTime
I need to show these data in a grid where employee name in the left side and then dates. So the column headers would be like Emp Name, 1,2,3,4....,30, With or without data, number of days in the month needs to be printed.
I realized three ways to do this.
Get attendance and employee data in a join query order by empID. Then loop through the data and print it if it is matching with current date.This will go until the empID change in current loop.
Loop through employees, then loop for days in the month, in every record get attendance from the database for particular employee and particular dates.
foreach($employees as $emp)
{
$empID = $emp['empID'];
for($day =1; $day<=$maxDaysInTheMonth $day++)
{
$attendance = getAttendanceFromDatabase($empID,$day);
}
}
To make performance better we try to minimize database connections and unnecessary loops. I like to implement the second way as it has minimum conditions and loops and code is clean. But it is making database retrieval for every employee, every day. Can someone pointout some facts for performance please.

Fetching records in a single query and looping through it is better. As it has to call database server a single time. For the second way - it has to call the database server multiple times which is more costlier.
Then make an associative array from the data. The index would be the empID.
After generating the array you can use it as you want.

Try this query
$sql="SELECT employee.empName AS empName, attendance.date AS date FROM employee,attendance WHERE employee.empID=attendance.empID";

As #Sougata suggest, Fetching records in a single query and looping through it is better. But keep in mind the query performance should be increased as follows:
Avoid Multiple Joins in a Single Query
Try to avoid writing a SQL query using multiple joins that includes outer joins, cross apply, outer apply and other complex sub queries. It reduces the choices for Optimizer to decide the join order and join type. Sometime, Optimizer is forced to use nested loop joins, irrespective of the performance consequences for queries with excessively complex cross apply or sub queries
Avoid Use of Non-correlated Scalar Sub Query
You can re-write your query to remove non-correlated scalar sub query as a separate query instead of part of the main query and store the output in a variable, which can be referred to in the main query or later part of the batch. This will give better options to Optimizer, which may help to return accurate cardinality estimates along with a better plan.
Creation and Use of Indexes
We are aware of the fact that Index can magically reduce the data retrieval time but have a reverse effect on DML operations, which may degrade query performance. With this fact, Indexing is a challenging task, but could help to improve SQL query performance and give you best query response time.
Create a Highly Selective Index
Selectivity define the percentage of qualifying rows in the table (qualifying number of rows/total number of rows). If the ratio of the qualifying number of rows to the total number of rows is low, the index is highly selective and is most useful. A non-clustered index is most useful if the ratio is around 5% or less, which means if the index can eliminate 95% of the rows from consideration. If index is returning more than 5% of the rows in a table, it probably will not be used; either a different index will be chosen or created or the table will be scanned.
Position a Column in an Index
Order or position of a column in an index also plays a vital role to improve SQL query performance. An index can help to improve the SQL query performance if the criteria of the query matches the columns that are left most in the index key. As a best practice, most selective columns should be placed leftmost in the key of a non-clustered index.

Optimal mySQL table index structure for faster SELECT of a large range of daily data

I am wondering the best format to lay out my data in a mySQL table so that it can be queried in the fastest manner to gather an array of daily values to be further utilized by php.
So far, I have laid out the table as such:
item_id price_date price_amount
1 2000-03-01 22.4
2 2000-03-01 19.23
3 2000-03-01 13.4
4 2000-03-01 14.95
1 2000-03-02 22.5
2 2000-03-02 19.42
3 2000-03-02 13.4
4 2000-03-02 13.95
with item_id defined as an index.
Also, I am using:
"SELECT DISTINCT price_date FROM table_name"
to get an array containing a unique list of dates.
Furthermore, the part of the code that is within a loop (and the focus of my optimization question) is currently written as:
"SELECT price_amount FROM table_name WHERE item_id = 1 ORDER BY price_date"
This second "SELECT" statement is actually within a loop where I am selecting/storing-in-array the daily prices of each item_id requested.
All is currently functioning and pulling the data from mySQL properly, however, both the above listed "SELECT" statements are taking approx 4-5 seconds to complete per each run, and when looping through 100+ products to create a summary, adds up to a very inefficient/slow information system.
Is there any more-efficient way that I could structure the mySQL table and/or SELECT statements to retrieve the results faster? Perhaps defining a different index on a different column? I have used the EXPLAIN command to return information per the queries but am unsure how to use the EXPLAIN information to increase the efficiency of my queries.
Thanks in advance for any mySQL wizards that may be able to assist.

Single column index
I am using:
"SELECT DISTINCT price_date FROM table_name"
to get an array containing a unique list of dates.
This query can be executed more efficiently if you create an index for the price_date column:
ALTER TABLE table_name ADD INDEX price_idx (price_date);
Mutiple column index
Furthermore, the part of the code that is within a loop (and the focus of my optimization question) is currently written as:
"SELECT price_amount FROM table_name WHERE item_id = 1 ORDER BY price_date"
For the second query, you should create an index covering both the item_id and price_date column:
ALTER TABLE table_name ADD INDEX item_price_idx (item_id, price_date);

I know this is a bit late, but i stumbled across this and thought I would throw my thoughts into the mix.
Indexes used well are very helpful in speeding up queries (Explain shows some really godd results around which indexes are being chosen - if any - for a specific query). However efficient PHP will help even more.
In your case you do not show the PHP, but it looks like you offer a list of dates and then loop through finding all the items in that date to get the prices. It would be more efficient to do something like the following:
Select item_id, price_amount from table_name where price_date= order by item_id, price_amount
with an index (preferably a Unique Index) on price_date,item_id,price_amount
You then have a single loop through the resultant SQL not a loop with multiple SQL connections (this is especially true if your SQL server is separate from the PHP box as an external network connection can have an overhead).
4-5 seconds for a single query though is very slow )by a factor of at least 100x) so it would indicate a problem (very large table with no key to use) or disk issues (potentially).

How can I optimize this simple database and query using php and mysql?

I pull a range (e.g. limit 72, 24) of games from a database according to which have been voted most popular. I have a separate table for tracking game data, and one for tracking individual votes for a game (rating from 1 to 5, one vote per user per game). A game is considered "most popular" or "more popular" when that game has the highest average rating of all the rating votes for said game. Games with less than 5 votes are not considered. Here is what the tables look like (two tables, "games" and "votes"):
games:
gameid(key)
gamename
thumburl
votes:
userid(key)
gameid(key)
rating
Now, I understand that there is something called an "index" which can speed up my queries by essentially pre-querying my tables and constructing a separate table of indices (I don't really know.. that's just my impression).
I've also read that mysql operates fastest when multiple queries can be condensed into one longer query (containing joins and nested select statements, I presume).
However, I am currently NOT using an index, and I am making multiple queries to get my final result.
What changes should be made to my database (if any -- including constructing index tables, etc.)? And what should my query look like?
Thank you.

Your query that calculates the average for every game could look like:
SELECT gamename, AVG(rating)
FROM games INNER JOIN votes ON games.gameid = votes.gameid
GROUP BY games.gameid
HAVING COUNT(*)>=5
ORDER BY avg(rating) DESC
LIMIT 0,25
You must have an index on gameid on both games and votes. (if you have defined gameid as a primary key on table games that is ok)

According to the MySQL documentation, an index is created when you designate a primary key at table creation. This is worth mentioning, because not all RDBMS's function this way.
I think you have the right idea here, with your "votes" table acting as a bridge between "games" and "user" to handle the many-to-many relationship. Just make sure that "userid" and "gameid" are indexed on the "votes" table.

If you have access to use InnoDB storage for your tables, you can create foreign keys on gameid in the votes table which will use the index created for your primary key in the games table. When you then perform a query which joins these two tables (e.g. ... INNER JOIN votes ON games.gameid = votes.gameid) it will use that index to speed things up.
Your understanding of an index is essentially correct — it basically creates a separate lookup table which it can use behind the scenes when the query is executed.
When using an index it is useful to use the EXPLAIN syntax (simply prepend your SELECT with EXPLAIN to try this out). The output it gives show you the list of possible keys available for the query as well as which key the query is using. This can be very helpful when optimising your query.

An index is a PHYSICAL DATA STRUCTURE which is used to help speed up retrieval type queries; it's not simply a table upon a table -> good for a concept though. Another concept is the way indexes work at the back of your text book (the only difference is with your book a search key could point to multiple pages / matches whereas with indexes a search key points to only one page/match). An index is defined by data structures so you could use a B+ tree index and there are even hash indexes. It's Database/Query optimization from the physical/internal level of the Database - I'm assuming that you know that you're working at the higher levels of the DBMS which is easier. An index is rooted within the internal levels and that make DB query optimization much more effective and interesting.
I've noticed from your question that you have not even developed the query as yet. Focus on the query first. Indexing comes after, as a matter of a fact, in any graduate or post graduate Database course, indexing falls under the maintenance of a Database and not necessarily the development.
Also N.B. I have seen quite many people say as a rule to make all primary keys indexes. This is not true. There are many instances where a primary key index would slow up the Database. Infact, if we were to go with only primary indexes then should use hash indexes since they work better than B+ trees!
In summary, it doesn't make sense to ask a question for a query and an index. Ask for help with the query first. Then given your tables (relational schema) and SQL query, then and only then could I advice you on the best index - remember its maintenance. We can't do maintanance if there is 0 development.
Kind Regards,
N.B. most questions concerning indexes at the post graduate level of many computing courses are as follows: we give the students a relational schema (i.e. your tables) and a query and then ask: critically suggest a suitable index for the following query on the tables ----> we can't ask a question like this if they dont have a query

Running queries on tables with more than 1million rows in

I am indexing all the columns that I use in my Where / Order by, is there anything else I can do to speed the queries up?
The queries are very simple, like:
SELECT COUNT(*)
FROM TABLE
WHERE user = id
AND other_column = 'something'`
I am using PHP 5, MySQL client version: 4.1.22 and my tables are MyISAM.

Talk to your DBA. Run your local equivalent of showplan. For a query like your sample, I would suspect that a covering index on the columns id and other_column would greatly speed up performance. (I assume user is a variable or niladic function).
A good general rule is the columns in the index should go from left to right in descending order of variance. That is, that column varying most rapidly in value should be the first column in the index and that column varying least rapidly should be the last column in the index. Seems counter intuitive, but there you go. The query optimizer likes narrowing things down as fast as possible.

If all your queries include a user id then you can start with the assumption that userid should be included in each of your indexes, probably as the first field. (Can we assume that the user id is highly selective? i.e. that any single user doesn't have more than several thousand records?)
So your indexes might be:
user + otherfield1
user + otherfield2
etc.
If your user id is really selective, like several dozen records, then just the index on that field should be pretty effective (sub-second return).
What's nice about a "user + otherfield" index is that mysql doesn't even need to look at the data records. The index has a pointer for each record and it can just count the pointers.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.