Ok so I've a SQL query here:
SELECT a.id,... FROM article AS a WHERE a.type=1 AND a.id=3765 ORDER BY a.datetime DESC LIMIT 1
I wanted to get exact article by country and id and created for that index with two columns type and id. Id is also primary key.
I used the EXPLAIN keyword to see which index is used and instead of the multiple column index it used primary key index, but I did set the where stuff exactly in order as the index is created.
Does MySQL use the primary key index instead of the multiple column index because the primary one is faster? Or should I force MySql to use the multiple column index?
P.S. Just noticed it was stupid to use order when there is 1 result row. Haha. It increased the search time for 0.0001 seconds. :P
I don'e KNOW, but I would THINK that the primary key index would be the fastest available. And if it is, there's not much use using any other index. You're either going to have a article with an id of 3765 or you're not. Scanning that single row to determine if the type matches is trivial.
If you're only returning one row, there's no point to your ORDER BY clause. And the only point to the a.type=1 is to reject an article with the right id if the type is not correct.
MySQL allows for up to 32 indexes for each table, and each index can incorporate up to 16 columns. A multiple-column / composite index is considered a sorted array containing values that are created by concatenating the values of the indexed columns. MySQL uses multiple-column indexes in such a way that queries are fast when you specify a known quantity for the first column of the index in a WHERE clause, even if you do not specify values for the other columns.
If you look very carefully in how MySQL uses indexes, you will find that indexes are used to find rows with specific column values quickly. Without an index, MySQL must begin with the first row and then read through the entire table to find the relevant rows.
In MySQL, a primary key column is automatically indexed for efficiency, as they use the in-built AUTO_INCREMENT feature of MySQL. On the other hand, one should not go overboard with indexing. While it does improve the speed of reading from databases, it slows down the process of altering data in a database (because the changes need to be recorded in the index). Indexes are best used on columns:-
that are frequently used in the WHERE part of a query
that are frequently used in an ORDER BY part of a query
that have many different values (columns with numerous repeating values ought not to be indexed).
So I try to use the primary key if my queries can suffice its use. When & only when it is required for more such indexing & fastness of fetching records, do I use the composite indexes.
Hope it helps.
The primary key is unique, so there's no need for MySQL to check any other index. a.id=3765 guarantees that there will be no more than one row returned. If a.type=1 is false for that row, then nothing will be returned.
Related
I have a problem with creating index described in answer for this question: sql unique constraint on a 2 columns combination
I am using MySql, and I received syntax error, my version of this query is as follows:
CREATE UNIQUE INDEX ON friends (LEAST(userID, friendID), GREATEST(userID, friendID));
LEAST and GREATEST functions are available in MySql, but maybe the syntax should be different?
I tried to make an ALTER TABLE version, but it does not worked as well.
In MySQL, you can't use functions as the values for indexes.
The documentation does not explicitly state this, however, it is a basic characteristic of an index to only support "fixed" data:
Indexes are used to find rows with specific column values quickly. Without an index, MySQL must begin with the first row and then read through the entire table to find the relevant rows.
Generally, this "fixed" data is an individual column/field; with string-fields (such as varchar or text) you can have a prefix-index and not the entire column. Check out CREATE INDEX for more info on that.
The unique index that you're trying to create in you example will have a single record ever; that's not really a beneficial index since it doesn't help for searching the entire table. However, if you index your table on userID, friendID, using the LEAST() and GREATEST() functions in a SELECT statement will be optimized thanks to the index itself, so it may be what you're after in this case.
I want to remove an index in a table whose access in php never uses the indexed column. Index takes up extra space and I am trying to trim it. It's a table of phone numbers. A phone number is linked to a user profile's id. So it has 3 columns. id (index), number and person. I was wondering if removing the index will affect the queries that use number or person in the where clause. My gut feeling is that it shouldn't but I am afraid computer science doesn't work on gut feelings. The data is accessed via joins. For example...
SELECT *
FROM people ... LEFT JOIN
phoneNumbers
ON people.id = phoneNumbers.person
Edit: Apparently no one seems to be able to answer the question in the title.
In the case you show, only the person column would benefit from an index.
Indexes help in basically four cases:
Row restriction, that is finding the rows by value instead of examining every row in the table.
Joining is a subset of row restriction, i.e. each distinct value in the first table looks up matching rows in the second table. Indexing a column that is referenced in the ON clause is done in the same way you would index a column referenced in the WHERE clause.
Sorting, to retrieve rows in index order instead of having to sort the result set as an additional step.
Distinct and Group By, to scan each distinct value in an index.
Covering index, that is when the query needs only the columns that are found in the index.
In the case of InnoDB, every table is treated as an index-organized table based on its primary key, and we should take advantage of this because primary key lookups are very efficient. So if you can redefine a primary key on your phoneNumbers.person column (in part), that would be best.
I think it is a good idea for all tables to have explicit primary keys and an index necessarily comes with these. For instance, it becomes difficult to delete rows in the table, if unwanted duplicates were to appear.
In general, indexes are used for where clauses, on clauses, and order by. If you have an id column, then foreign key references to the table should be using that column, and not the other two columns. The index might also be used for a select count(*) from table query, but I'm not 100% sure if MySQL does this.
If removing an index on a column makes that big a difference, then you should be investigating other ways to make your database more efficient. One method would be using partitioning to store different parts of the database in different files.
If the id column is an auto-incrementing integer, you have already indexed the table in the most efficient way possible. Removing it will make MySQL treat (number, person) as the table's primary key, which will cause less efficient look-ups.
Additionally, any index you create in the future will contain two columns, the first being the indexed field in the desired order, the second being the table's primary key. If you remove the id column and later decide to index the table on person, then your index will be larger than the table itself: each row would be: | person | (number, person) |.
Given that you're querying on this relationship, the person column should be indexed, and leaving the id column in place will ensure that the person index is as small and as quick as possible.
The column "id" seems useless. If I've understood you correctly, I'd
drop the "id" column,
add a primary key constraint on {person, number}, and
a foreign key reference from "person" to people.id.
I'm assuming each person can have more than one phone number.
Creating a primary key constraint has a side-effect that you might not want. It creates an internal index on the key columns.
What is the purpose of the Secondary key? Say I have a table that logs down all the check-ins (similar to Foursquare), with columns id, user_id, location_id, post, time, and there can be millions of rows, many people have stated to use secondary keys to speed up the process.
Why does this work? And should both user_id and location_id be secondary keys?
I'm using mySQL btw...
Edit: There will be a page that lists/calculates all the check-ins for a particular user, and another page that lists all the users who has checked-in to a particular location
mySQL Query
Type 1
SELECT location_id FROM checkin WHERE user_id = 1234
SELECT user_id FROM checkin WHERE location_id = 4321
Type 2
SELECT COUNT(location_id) as num_users FROM checkin
SELECT COUNT(user_id) as num_checkins FROM checkin
The key (also called index) is for speeding up queries. If you want to see all checkins for a given user, you need a key on user_id field. If you want to see all checking for a given location, you need index on location_id field. You can read more at mysql documentation
I want to comment on your question and your examples.
Let me just suggest strongly to you that since you are using MySQL you make sure that your tables are using the innodb engine type for many reasons you can research on your own.
One important feature of InnoDB is that you have referential integrity. What does that mean? In your checkin table, you have a foreign key of user_id which is the primary key of the user table. With referential integrity, MySQL will not let you insert a row with a user_id that doesn't exist in the user table. Using MyISAM, you can. That alone should be enough to make you want to use the innodb engine.
To your question about keys/indexes, essentially when a table is defined and a key is declared for a column or some combination of columns, mysql will create an index.
Indexes are essential for performance as a table grows with the insert of rows.
All relational databases and Document databases depend on an implementation of BTree indexing. What Btree's are very good for, is finding an item (or not) using a predictable number of lookups. So when people talk about the performance of a relational database the essential building block of that is use of btree indexes, which are created via KEY statements or with alter table or create index statements.
To understand why this is, imagine that your user table was simply a text file, with one line per row, perhaps separated by commas. As you add a row, a new line in the text file gets added at the bottom.
Eventually you get to the point that you have 10,000 lines in the file.
Now you want to find out if you entered a line for one particular person with the last name of Smith. How can you find that out?
Without any sort of sortation of the file, or a separate index, you have but one option and that is to start at the first line in the file and scan through every line in the table looking for a match. Even if you found a Smith, that might not be the only 'Smith' in the table, so you have to read the entire file from top to bottom every time you want do do this search.
Obviously as the table grows the performance of searching gets worse and worse.
In relational database parlance, this is known as a "table scan". The database has to start at the first row and scan through reading every row until it gets to the end.
Without indexes, relational databases still work, but they are highly dependent on IO performance.
With a Btree index, the rows you want to find are found in the index first. The indexes have a pointer directly to the data you want, so the table no longer needs to be scanned, but instead the individual data pages required are read. This is how a database can maintain adequate performance even when there are millions or 10's or 100's of millions of rows.
To really start to gain insight into how mysql works, you need to get familiar with EXPLAIN EXTENDED ... and start looking at the explain plans for queries. Simple ones like those you've provided will have simple plans that show you how many rows are being examined to get a result and whether or not they are using one or more indexes.
For your summary queries, indexes are not helpful because you are doing a COUNT(). The table will need to be scanned when you have no other criteria constraining the search.
I did notice what looks like a mistake in your summary queries. Just based on your labels, I would think that these are the right queries to get what you would want given your column alias names.
SELECT COUNT(DISTINCT user_id) as num_users FROM checkin
SELECT COUNT(*) as num_checkins FROM checkin
This is yet another reason to use InnoDB, which when properly configured has a data cache (innodb buffer pool) similar to other rdbms's like oracle and sql server. MyISAM doesn't cache data at all, so if you are repeatedly querying the same sorts of queries that might require a lot of IO, MySQL will have to do all that data reading work over and over, whereas with InnoDB, that data could very well be sitting in cache memory and have the result returned without having to go back and read from storage.
Primary vs Secondary
There really is no such concept internally. A Primary key is special because it allows the database to find one single row. Primary keys must be unique, and to reflect that, the associated Btree index is unique, which simply means that it will not allow you to have 2 keys with the same data to exist in the index.
Whether or not an index is unique is an excellent tool that allows you to maintain the consistency of your database in many other cases. Let's say you have an 'employee' table with the SS_Number column to store social security #. It makes sense to have an index on that column if you want the system to support finding an employee by SS number. Without an index, you will tablescan. But you also want to have that index be unique, so that once an employee with a SS# is inserted, there is no way the database will let you enter a duplicate employee with the same SS#.
But to demystify this for you, when you declare keys these indexes are just being created for you and used automagically in most cases, when you define the tables.
It's when you aren't dealing with keys (primary or foreign) as in the example of usernames, first, last & last names, ss#'s etc., that you need to also be aware of how to create an index because you are searching (using where clause criteria) on one or more columns that aren't keys.
I have some location data, which is in a table locations with the key being the unique location_id
I have some user data, which is in a table users with the key being the unique user_id
Two ways I was thinking of linking these two together:
I can put the 'location' in each user's data.
'SELECT user_id FROM users WHERE location = "LOCATIONID";'
//this IS NOT searching with the table's key
//this does not require an explode
//this stores 1 integer per user
I can also put the 'userIDs' as a comma delimited string of ids into each location's data.
'SELECT userIDs FROM locations WHERE location_id = "LOCATIONID";'
//this IS searching with the tables key
//this needs an explode() once the comma delimited list is retrieved
//this stores 1 string of user ids per location
so I wonder, which would be most efficient. I'm not really sure how much the size of the data stored could also impact the speed. I want retrievals that are as fast as possible when trying to find out which users are at which location.
This is just an example, and there will be many other tables like location to compare to the users, so the efficiency, or lack of, will be multiplied across the whole system.
Stick with option 1. Keep your database tables normalised as much as possible till you know you have a performance problem.
There's a whole slew of problems with option 2, including the lack of ability to then use the user ID's till you pull them into PHP and then having to fire off lots more SQL queries for each ID. This is extremely inefficient. Do as much inside MySQL as possible, the optimisations that the database layer can do while running the query will easily be a lot quicker than anything you write in PHP.
Regarding your point about not searching on the primary key, you should add an index to the location column. All columns that are in a WHERE clause should be indexed as a general rule. This negates the issue of not searching on the primary key, as the primary key is just another type of index for the purposes of performance.
Use the first one to keep your data normalized. You can then query for all users for a location directly from the database without having to go back to the database for each user.
Be sure to add the correct index on your users table too.
CREATE TABLE locations (
locationId INT PRIMARY KEY AUTO_INCREMENT
) ENGINE=INNODB;
CREATE TABLE users (
userId INT PRIMARY KEY AUTO_INCREMENT,
location INT,
INDEX ix_location (location)
) ENGINE=INNODB;
Or to only add the index
ALTER TABLE users ADD INDEX ix_location(location);
Have you heard of foreign key ?
get details from many tables tables using join .
You can use of sub query also.
As you said there are two tables users and locations.
Keep userid as foreign key in locations and fetch it based on that.
When you store the user IDs as a comma-separated list in a table, that table is not normalized (especially it violates the first normal form, item 4).
It is perfectly valid to denormalize tables for optimization purposes. But only after you have measured that this is where the bottleneck actually is in your specific situation. This, however, can only be determined if you know which query is executed how often, how long they take and whether the performance of the query is critical (in relation to other queries).
Stick with option 1 unless you know exactly why you have to denormalize your table.
For example, I'm doing the next action:
SELECT COUNT(id)
FROM users
WHERE unique_name = 'Wiliam'
// if Wiliam don't exists then...
INSERT INTO users
SET unique_name = 'Wiliam'
The question is, I'm doing the SELECT COUNT(id) check every time I insert a new user, despite of using an unique key or not, so... if "unique_name" has an UNIQUE key it will be better for performance than using a normal key?
What you mean is a UNIQUE CONSTRAINT on the column which will be updated. Reads will be faster, Inserts will be just a bit slower. It will still be faster than your code checking first and then inserting the value though. Just let mysql do its thing and return an error to you if the value is not unique.
You didn't say what this is for, which would help. If its part of an authentication system, then why doesn't your query include the user's password as well? If it's not, a unique indexed column used to store names isn't going to work very well in a real-world system unless you are OK with having just 1 and only Wiliam in your system. (Was that supposed to be William?)
And if that name field is really unique you do not need to use COUNT(ID) in your query. If 'unique_name' is truly unique you either get an id number returned from your query or you get nothing.
You'd want something like this:
SELECT id FROM users WHERE unique_name = 'Wiliam'
No record return, no Wiliam.
An index (unique or non-unique -- I don't know what you're after here) on unique_name will improve the performance.
Your use of 'unique key' isn't very logical so I suspect you are getting confused about the nomenclature of keys, indexes, their relationships, and the purposes for them.
KEYS in a database are used to create and identify relationships between sets of data. This is what makes the 'relational' possible in a relational database.
Keys come in 2 flavors: Primary and foreign.
PRIMARY KEYS identify each row in a table. The value or values that comprise the key must be unique.
Primary keys can be made from a single column or made of several columns (in which case it is called a composite key) that together uniquely identifies the row. Again the important thing here is uniqueness.
I use MySql's auto-increment integer data type for my primary keys.
FOREIGN KEYS identify which rows in a table have a relationship with other rows in other tables. A foreign key of a record in one table is the primary key of the related record in the other table. A foreign key is not unique -- in many-to-many relationships there are by definition multiple records with the same foreign key. They should however be indexed.
INDEXES are used by the database as a sort of short-hand method to quickly look up values, as opposed to scanning the entire table or column for a match. Think of the index in the back of a book. Much easier to find something using a book's index than by flipping through the pages looking for it.
You may also want to index a non-key column for better performance when searching on that column. What column do you use frequently in a WHERE clause? Probably should index it then.
UNIQUE INDEX is an index where all the values in it must be distinct. A column with a unique index will not let you insert a duplicate value, because it would violate the unique constraint. Primary keys are unique indexes. But unique indexes do not have to be primary keys, or even a key.
Hope that helps.
[edited for brevity]
Having a unique constraint is a good thing because it prevents insertion of duplicated entries in case your program is buggy (are you missing a "for update" clause in your select statement?) or in case someone inserts data not using your application.
You should, however, not depend on it in your application for normal operation. Lets assume unique_name is an input field a user can specify. Your application should check whether the name is unique. If it is, insert it. If it was not, tell the user.
It is a bad idea to just try the insert in all cases and see if it was successful: It will create errors in the database server logs that makes it more difficult to find real errors. And it will render your current transaction useless, which may be an issue depending on the situation