Finding the Need to Use Indexes in MySQL

Finding the Need to Use Indexes in MySQL - php

I have three or four tables in a MySQL database associated with an upcoming Android app that potentially may explode to thousands of rows very fast. At this time, I have about 6 - 8 SELECT and 2 INSERT SQL commands that will need to be done.
After doing research, I have found that I will have to use indexing to cut down on load time. I have searched for several tutorials on different sites to see if I can pick this up -- but I have found nothing that explains very clearly what and how to to do this.
Here's the situation:
First and foremast, it will be using a Godaddy MySQL server. Unlimited bandwidth and 150,000 MB. Here is one table that will be getting lots of use:
items_id (int 11)
item (100 varchar)
cat_id (int 11)
In PHPMyAdmin it says for indexes:
Keyname/PRIMARY type/PRIMARY Cardinality/576 items_id
So it appears there is an index established, correct?
Here is one SQL Query (via PHP) related to this table (SELECT):
"SELECT * FROM items WHERE cat_id = ' ".$_REQUEST['category_id']."' ORDER BY TRIM(LEADING 'The ' FROM item) ASC;"
And another (INSERT):
"INSERT INTO items (item, cat_id) VALUES ('{$newItem}', '{$cat_id}')"
My main questions are: With these methods, am I utilizing the best speed possible and making use of the established indexes? Or does this have "slow" written all over it?

Simple selects / inserts cannot be changed to take advantage of indexes.
But indexes can be added to the tables to make the queries run faster.
Well actually inserts don't do anything with indexes unless you're using InnoDB as a storage engine and foreign key constraints.
If you're using a column in the where / group by / order by clauses of a select statement you may consider adding an index on it. A good ideea would be to use EXPLAIN on the queries in cause and see how the database engine uses the columns in the where clause.
If a column has a small set of non-unique possible values (gender: male/ female) it makes little sense to add an index for it because you won't be searching for all the females or all the males (and half a table search is not very different than a full table search). But if you use that column along with another column to filter / group / sort you may want to add a composite index (multi-column index) on them.
Databases within MySQL are organized as folders. The folders contain multiple files for each table.
There's a table definition file, a table data file and some index files. If you define an index for a column or multiple columns, a file for that index will be created.
If you don't have any indexes not even the primary key, any Select statement is going to do a full table search which for hundreds of thousands of entries becomes noticeably slow.
If you define an index it will read all the unique values in the table for that column or set of columns and write a file that lists correspondences between a certain value of that column or those columns and the records that contain it.
That file should be much smaller that the data file and should usually fit into memory entirely along side other index files. MySQL now has to intersect the matching record lists in that file to find out which records match the select criteria and then cherry-pick the data it needs from the data table.
Primary and Unique indexes have a direct correspondence between one value and one record. So searching by unique value is fast.

Related

performance of MYSQL and PHP fetching records

Scenario 1
I have one table lets say "member". In that table "member" i have 7 fields ( memid,login_name,password,age,city,phone,country ). In my table i have 10K records.i need to fetch one record . so i'm using the query like this
mysql_query("select * from member where memid=999");
Scenario 2
I have the same table called "member" but i'm splitting the table like this member and member_txt .So in my member_txt table i have memid,age,phone,city,country )and in my member table i have memid,login_name,password .
Which is the best scenario to fetch the data quickly? Either going to single table or split the table into two with reference?
Note: I need to fetch the particular data in PHP and MYSQL. Please let me know which is best method to follow.
we have 10K records

For your own health, use the single table approach.
As long as you are using a primary key for memid, things are going to be lightning fast. This is because PRIMARY KEY automatically assigns an index, which basically tells the exact location for the data and eliminates the need to go through data that it would otherwise do.
From http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html
Indexes are used to find rows with specific column values quickly.
Without an index, MySQL must begin with the first row and then read
through the entire table to find the relevant rows. The larger the
table, the more this costs. If the table has an index for the columns
in question, MySQL can quickly determine the position to seek to in
the middle of the data file without having to look at all the data. If
a table has 1,000 rows, this is at least 100 times faster than reading
sequentially. If you need to access most of the rows, it is faster to
read sequentially, because this minimizes disk seeks.
Your second approach only makes your system more complex, and provides no benefits.
Use the scenario 1.

please make the memid primary/unique key then having one table is faster than having two tables.

In general you should not see to much impact on performance with 10k rows as long as your accessing it by your primary key.
Also note that fetching data from one table is also faster than fetching data from 2 tables.
If you want to optimize further use the column names in the select statement instead of the * operator.

Proper Indexing MySQL

As far as indexing goes, is it proper to index all fields that will be searched (within a WHERE) clause to speed up SELECTS? For example my database contains a profiles table which stores user information such as name, intrestCode,zip, description, and email. The profile record is identified by a PRIMARY id column which uniquely corresponds with the userid. I made zip and intrestCode an index since profiles will be searched by zip and possibly intrestCode (SELECT `blah`,`blah`... FROM profile WHERE zip=?, SELECT `blah`,`blah`... FROM profile WHERE zip=? && intrestCode=?). Am I doing it right?

Sounds basically correct. You should be aware that you can't use two different indices on the same table in the same query, so if you ran
CREATE INDEX `zip` ON `profile` (`zip`);
CREATE INDEX `intrestCode` ON `profile` (`intrestCode`);
then the query
SELECT `blah`,`blah`... FROM profile WHERE zip=? && intrestCode=?
can only look up one table from the index. The secret here is that you can create a single index on two tables, like so:
CREATE INDEX `zip+intrestCode` ON `profile` (`zip`, `intrestCode`);
MySQL can use this for queries that use either zip alone in the WHERE clause, or use both zip and intrestCode, but not for queries that use only intrestCode in the WHERE clause.
(This is because each index covers the whole table. If MySQL were to try and look up zip and intrestCode from different tables, then it would be retrieving lots of irrelevant rows from the second index. Therefore, it only looks at one index. If you want it to use the index on both columns, you need to have one index that includes both columns.)

Indexing relation datbases is a bit of an art. The best approach is to look at all the different queries you will be making and putting indices on the columns involved. Use EXPLAIN to see what indices a query is going to use.
However, remember that MySQL can only use one index from a table at a time. This is why you can put an index on multiple columns at once. And MySQL can use a multi-column index if it only needs the columns in the start of the index. In your example, I would put a multi-column index zip and intrestCode together because that will help both queries. That way you don't need a separate index on just zip.

I believe basically yes. You should monitor your database to encounter further problems and optimize your queries and tables based on the results too.

Generally you want to add indexes to columns that you will join on or have in your where clause. Remember that an index on two columns is different than one index on each of the two columns. Also if you have an index on multiple columns the order matters.
Say you have two columns A and B and have an two column index in the order A then B. There are three cases:
Where clause against column A: index is used
Where clause against column B: index is NOT used
Where clause against columns A and B: index is used
Like staticsan says, indexing is an art and there are no rules that apply 100% of the time. Use the explain plan to see how your query is performing and make tweaks accordingly.

Purpose of Secondary Key

What is the purpose of the Secondary key? Say I have a table that logs down all the check-ins (similar to Foursquare), with columns id, user_id, location_id, post, time, and there can be millions of rows, many people have stated to use secondary keys to speed up the process.
Why does this work? And should both user_id and location_id be secondary keys?
I'm using mySQL btw...
Edit: There will be a page that lists/calculates all the check-ins for a particular user, and another page that lists all the users who has checked-in to a particular location
mySQL Query
Type 1
SELECT location_id FROM checkin WHERE user_id = 1234
SELECT user_id FROM checkin WHERE location_id = 4321
Type 2
SELECT COUNT(location_id) as num_users FROM checkin
SELECT COUNT(user_id) as num_checkins FROM checkin

The key (also called index) is for speeding up queries. If you want to see all checkins for a given user, you need a key on user_id field. If you want to see all checking for a given location, you need index on location_id field. You can read more at mysql documentation

I want to comment on your question and your examples.
Let me just suggest strongly to you that since you are using MySQL you make sure that your tables are using the innodb engine type for many reasons you can research on your own.
One important feature of InnoDB is that you have referential integrity. What does that mean? In your checkin table, you have a foreign key of user_id which is the primary key of the user table. With referential integrity, MySQL will not let you insert a row with a user_id that doesn't exist in the user table. Using MyISAM, you can. That alone should be enough to make you want to use the innodb engine.
To your question about keys/indexes, essentially when a table is defined and a key is declared for a column or some combination of columns, mysql will create an index.
Indexes are essential for performance as a table grows with the insert of rows.
All relational databases and Document databases depend on an implementation of BTree indexing. What Btree's are very good for, is finding an item (or not) using a predictable number of lookups. So when people talk about the performance of a relational database the essential building block of that is use of btree indexes, which are created via KEY statements or with alter table or create index statements.
To understand why this is, imagine that your user table was simply a text file, with one line per row, perhaps separated by commas. As you add a row, a new line in the text file gets added at the bottom.
Eventually you get to the point that you have 10,000 lines in the file.
Now you want to find out if you entered a line for one particular person with the last name of Smith. How can you find that out?
Without any sort of sortation of the file, or a separate index, you have but one option and that is to start at the first line in the file and scan through every line in the table looking for a match. Even if you found a Smith, that might not be the only 'Smith' in the table, so you have to read the entire file from top to bottom every time you want do do this search.
Obviously as the table grows the performance of searching gets worse and worse.
In relational database parlance, this is known as a "table scan". The database has to start at the first row and scan through reading every row until it gets to the end.
Without indexes, relational databases still work, but they are highly dependent on IO performance.
With a Btree index, the rows you want to find are found in the index first. The indexes have a pointer directly to the data you want, so the table no longer needs to be scanned, but instead the individual data pages required are read. This is how a database can maintain adequate performance even when there are millions or 10's or 100's of millions of rows.
To really start to gain insight into how mysql works, you need to get familiar with EXPLAIN EXTENDED ... and start looking at the explain plans for queries. Simple ones like those you've provided will have simple plans that show you how many rows are being examined to get a result and whether or not they are using one or more indexes.
For your summary queries, indexes are not helpful because you are doing a COUNT(). The table will need to be scanned when you have no other criteria constraining the search.
I did notice what looks like a mistake in your summary queries. Just based on your labels, I would think that these are the right queries to get what you would want given your column alias names.
SELECT COUNT(DISTINCT user_id) as num_users FROM checkin
SELECT COUNT(*) as num_checkins FROM checkin
This is yet another reason to use InnoDB, which when properly configured has a data cache (innodb buffer pool) similar to other rdbms's like oracle and sql server. MyISAM doesn't cache data at all, so if you are repeatedly querying the same sorts of queries that might require a lot of IO, MySQL will have to do all that data reading work over and over, whereas with InnoDB, that data could very well be sitting in cache memory and have the result returned without having to go back and read from storage.
Primary vs Secondary
There really is no such concept internally. A Primary key is special because it allows the database to find one single row. Primary keys must be unique, and to reflect that, the associated Btree index is unique, which simply means that it will not allow you to have 2 keys with the same data to exist in the index.
Whether or not an index is unique is an excellent tool that allows you to maintain the consistency of your database in many other cases. Let's say you have an 'employee' table with the SS_Number column to store social security #. It makes sense to have an index on that column if you want the system to support finding an employee by SS number. Without an index, you will tablescan. But you also want to have that index be unique, so that once an employee with a SS# is inserted, there is no way the database will let you enter a duplicate employee with the same SS#.
But to demystify this for you, when you declare keys these indexes are just being created for you and used automagically in most cases, when you define the tables.
It's when you aren't dealing with keys (primary or foreign) as in the example of usernames, first, last & last names, ss#'s etc., that you need to also be aware of how to create an index because you are searching (using where clause criteria) on one or more columns that aren't keys.

Indexing columns but keeping Inserts fast

I need to index most of the columns in one of my tables because it is being used for log searching by users with lots of filters.
This same table has about 2-3 inserts per second. I know indexing affects inserts on a table so will this be a problem?
I am using the latest version of MySQL (?) with PHP 5.
Thanks.

There are a few tricks you can use to speed up inserts
Index definition
Use an integer autoincrement column as a primary key
Avoid unique keys
Avoid foreign keys if not necessary
Only use integer fields for foreign keys if you must use them
Use partial keys for text/string columns
Do not use multicolumn keys
Consider the cardinality of your indexes; if there are 1,000,000 rows, but only 5 unique values, using a key does not make much sense, a full table scan is faster.
Inserts
Bundle your inserts if possible, inserting 5 rows in one statement is considerably faster than doing 5 individual inserts, so if you can store your log in a non-indexed temp table for 5 minutes and then doing an insert-select from that into your indexed table you will gain a lot of speed.
Use insert delayed.
You can disable index updates before a bulk insert and enable the index updates afterwards. Do an ALTER TABLE ... DISABLE KEYS before the bulk insert and ALTER TABLE ... ENABLE KEYS after the bulk insert. This will delay the update of all non unique indexes until after all the inserts are done.
Consider using LOAD DATA INFILE; it's the fastest insert mechanism MySQL has.
Maintenance
Use SHOW INDEX on your tables to check the stats and look for opportunities for improvement.
Run OPTIMIZE TABLE periodically on your MyISAM, InnoDB and Archive tables.
Choice of engine
Test to see if using MyISAM as a storage engine will improve performance.
Under some loads the archive storage engine can do faster inserts, test to see if it works faster for you. Archive will only allow a single primary index though, so for you this is not an option.
Check out the answers to the following questions on SO:
How to optimize mysql indexes so that INSERT operations happen quickly on a large table with frequent writes and reads?
are MySQL INSERT statements slower in huge tables?
Mysql InnoDB performance optimization and indexing

All I can suggest is to try it and see how it affects performance. Ensure that you only index those columns that will be used in searches, and only include more than one column in the index if "it makes sense" to do so - remember that in order for an index to be useable, the first column must be referenced in a WHERE, JOIN...ON clause, or ORDER BY - any latter columns mentioned in the index will not be used unless this condition is met.

If you only got 2-3 INSERTs per second, there is will be no problem with your indexes (unless there is a FULLTEXT index).

which of these 2 methods is most efficient with PHP/MYSQL

I have some location data, which is in a table locations with the key being the unique location_id
I have some user data, which is in a table users with the key being the unique user_id
Two ways I was thinking of linking these two together:
I can put the 'location' in each user's data.
'SELECT user_id FROM users WHERE location = "LOCATIONID";'
//this IS NOT searching with the table's key
//this does not require an explode
//this stores 1 integer per user
I can also put the 'userIDs' as a comma delimited string of ids into each location's data.
'SELECT userIDs FROM locations WHERE location_id = "LOCATIONID";'
//this IS searching with the tables key
//this needs an explode() once the comma delimited list is retrieved
//this stores 1 string of user ids per location
so I wonder, which would be most efficient. I'm not really sure how much the size of the data stored could also impact the speed. I want retrievals that are as fast as possible when trying to find out which users are at which location.
This is just an example, and there will be many other tables like location to compare to the users, so the efficiency, or lack of, will be multiplied across the whole system.

Stick with option 1. Keep your database tables normalised as much as possible till you know you have a performance problem.
There's a whole slew of problems with option 2, including the lack of ability to then use the user ID's till you pull them into PHP and then having to fire off lots more SQL queries for each ID. This is extremely inefficient. Do as much inside MySQL as possible, the optimisations that the database layer can do while running the query will easily be a lot quicker than anything you write in PHP.
Regarding your point about not searching on the primary key, you should add an index to the location column. All columns that are in a WHERE clause should be indexed as a general rule. This negates the issue of not searching on the primary key, as the primary key is just another type of index for the purposes of performance.

Use the first one to keep your data normalized. You can then query for all users for a location directly from the database without having to go back to the database for each user.
Be sure to add the correct index on your users table too.
CREATE TABLE locations (
locationId INT PRIMARY KEY AUTO_INCREMENT
) ENGINE=INNODB;
CREATE TABLE users (
userId INT PRIMARY KEY AUTO_INCREMENT,
location INT,
INDEX ix_location (location)
) ENGINE=INNODB;
Or to only add the index
ALTER TABLE users ADD INDEX ix_location(location);

Have you heard of foreign key ?
get details from many tables tables using join .
You can use of sub query also.
As you said there are two tables users and locations.
Keep userid as foreign key in locations and fetch it based on that.

When you store the user IDs as a comma-separated list in a table, that table is not normalized (especially it violates the first normal form, item 4).
It is perfectly valid to denormalize tables for optimization purposes. But only after you have measured that this is where the bottleneck actually is in your specific situation. This, however, can only be determined if you know which query is executed how often, how long they take and whether the performance of the query is critical (in relation to other queries).
Stick with option 1 unless you know exactly why you have to denormalize your table.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.