Why this MySQL query is faster without index?

Why this MySQL query is faster without index? - php

I am having trouble understanding why my MySQL query runs faster when I change it to use no indexes.
My first query takes 0.236s to run:
SELECT
u.id,
u.email,
CONCAT(u.first_name, ' ', u.last_name) AS u_name
FROM
tbl_user AS u
WHERE
u.site_id=1
AND u.role_id=5
AND u.removed_date IS NULL
ORDER BY
u_name ASC
LIMIT 0, 20
My second query takes 0.147s to run:
SELECT
u.id,
u.email,
CONCAT(u.first_name, ' ', u.last_name) AS u_name
FROM
tbl_user AS u USE INDEX ()
WHERE
u.site_id=1
AND u.role_id=5
AND u.removed_date IS NULL
ORDER BY
u_name ASC
LIMIT 0, 20
I have a unique index named idx_1 on columns site_id, role_id and email.
The EXPLAIN statement tells that it will use idx_1.
+----+-------------+-------+------+-------------------------------------+-------+---------+-------------+-------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+-------------------------------------+-------+---------+-------------+-------+----------------------------------------------------+
| 1 | SIMPLE | u | ref | idx_1,idx_import,tbl_user_ibfk_2 | idx_1 | 8 | const,const | 55006 | Using index condition; Using where; Using filesort |
+----+-------------+-------+------+-------------------------------------+-------+---------+-------------+-------+----------------------------------------------------+
The table has about 110000 records.
Thanks
UPDATE 1:
Below is the list of my table indexes:
Name Fields Type Method
---------------------------------------------------------------
idx_1 site_id, role_id, email Unique BTREE
idx_import site_id, external_id Unique BTREE
tbl_user_ibfk_2 role_id Normal BTREE
tbl_user_ibfk_3 country_id Normal BTREE
tbl_user_ibfk_4 preferred_country_id Normal BTREE
---------------------------------------------------------------

You haven't specified which mysql you are using. Does this explain it
Prior to MySQL 5.1.17, USE INDEX, IGNORE INDEX, and FORCE INDEX affect only which indexes are used when MySQL decides how to find rows in the table and how to process joins. They do not affect whether an index is used when resolving an ORDER BY or GROUP BY clause.
from https://dev.mysql.com/doc/refman/5.1/en/index-hints.html

Related

MySQL fulltext index query does not use any index?

I have a simple table created like this
CREATE TABLE IF NOT EXISTS metadata (
id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
title varchar(500),
category varchar(50),
uuid varchar(20),
FULLTEXT(title, category)
) ENGINE=InnoDB;
When I execute a fulltext search, it took 2.5s with 1M rows. So I execute a query planner and it does not use any index:
mysql> explain SELECT uuid, title, category, MATCH(title, category) AGAINST ('grimm' IN NATURAL LANGUAGE MODE) AS score FROM metadata HAVING score > 0 limit 20;
+----+-------------+----------+------------+------+---------------+------+---------+------+---------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------------+------+---------------+------+---------+------+---------+----------+-------+
| 1 | SIMPLE | metadata | NULL | ALL | NULL | NULL | NULL | NULL | 1036202 | 100.00 | NULL |
+----+-------------+----------+------------+------+---------------+------+---------+------+---------+----------+-------+
Is that expected? How can I speed this up?

Your query fetches every row in the table, calculates the natural language match, and then passes the results (still for every row) to the HAVING clause. This is a table-scan.
You should try putting the fulltext-indexed search into the WHERE clause instead, to reduce the number of matching rows.
mysql> explain SELECT uuid, title, category FROM metadata
WHERE MATCH(title, category) AGAINST ('grimm' IN NATURAL LANGUAGE MODE)
LIMIT 20;

Returning values even the result are empty

I have a table that is similar below.
| user_id | point_1 | point_2 | point_3
453123 1234 32 433
321543 1 213 321
My query is something like this:
$query = "SELECT * FROM my_table WHERE user_id = 12345 OR user_id = 987654321"
Obviously, this will return nothing since user_id 12345 OR user_id 987654321 do not exist on the table.
But I still want to return something like the one below :
| user_id | point_1 | point_2 | point_3
12345 0 0 0
987654321 0 0 0

You could use an inline view as a rowsource for your query. To return a zero in place of a NULL (which would be returned by the outer join when no matching row is found in my_table, you can use the IFNULL function.
e.g.
SELECT s.user_id
, IFNULL(t.point_1,0) AS point_1
, IFNULL(t.point_2,0) AS point_2
, IFNULL(t.point_3,0) AS point_3
FROM ( SELECT 12345 AS user_id
UNION ALL SELECT 987654321
) s
LEFT
JOIN my_table t
ON t.user_id = s.user_id
NOTE: If datatype of user_id column my_table is character, then I'd enclose the literals in the inline view in single quotes. e.g. SELECT '12345' AS user_id. If the characterset of the column doesn't match your client characterset, e.g. database column is latin1, and client characterset is UTF8, you'd want to force the character strings to be a compatible (coercible) characterset... SELECT _latin1'12345' AS user_id

You can't get the result you want using only a select statement. Only rows that exist somewhere will be returned.
The only way I can think to do this is to insert the query values into a temp table and then outer join against that for your query.
So the basic process would be:
create table temp1 (user_id integer);
insert into temp1 values (987654321); -- repeat as needed for query.
select t.user_id, m.* from temp1 t left outer join my_table m on m.user_id = t.user_id;
drop table temp1;
This isn't very efficient though.

Your desired result resembles the result of an OUTER JOIN - when some records exist only in one table and not the other, an OUTER JOIN will show all of the rows from one of the joined tables, filling in missing fields from the other table with NULL values.
To solve your particular problem purely in SQL, you could create a second table that contains a single field with all of the user_id values that you want to be able to show in your result. Something like:
+-----------+
| user_id |
+-----------+
| 1 |
+-----------+
| 2 |
+-----------+
| 3 |
+-----------+
| ... |
+-----------+
| 12344 |
+-----------+
| 12345 |
+-----------+
| 12346 |
+-----------+
| ... |
+-----------+
And so on. If this second table is named all_ids, you could then get your desired result by modifying your query as follows (exact syntax may vary by database implementation):
SELECT
*
FROM
all_ids AS i
LEFT OUTER JOIN
my_table AS t ON i.user_id = t.user_id
WHERE
i.user_id = 12345
OR i.user_id = 987654321;
This should produce the following result set:
+-----------+----------+----------+----------+
| user_id | point_1 | point_2 | point_3 |
+-----------+----------+----------+----------+
| 12345 | NULL | NULL | NULL |
+-----------+----------+----------+----------+
| 987654321 | NULL | NULL | NULL |
+-----------+----------+----------+----------+
It should be noted that this table full of IDs could take up a significant amount of disk space. An integer column in MySQL can hold 4,294,967,296 4-byte values, or 16 GB of data sitting around purely for your convenience in displaying some other data you don't have. So unless you need some smaller range or set of IDs available, or have disk space coming out your ears, this approach simply may not be practical.
Personally, I would not ask the database to do this in the first place. Essentially it's a display issue; you already get all the information you need from the fact that certain rows were not returned by your query. I would solve the display issue outside of the database, which in your case means filling in those zeroes with PHP.

delete duplicate rows that have blob text / mediumtext mysql

I have seen lots of posts on deleting rows using sql commands but i need to filter out rows which have mediumtext.
I keep getting an error Error Code: 1170. BLOB/TEXT column used in key specification without a key length from solution such as:
ALTER IGNORE TABLE foobar ADD UNIQUE (title, SID)
My table is simple, i need to check for duplicates in mytext, id is unique and they are AUTO_INCREMENT.
As a note, the table has about a million rows, and all attempts keep timing out. I would need a solution that performs actions in batches such as WHERE id>0 AND id<100
Also I am using MySQL Workbench on amazons RDS
From a table like this
+---+-----+-----+------+-------+
|id |fname|lname|mytext|morevar|
|---|-----|-----|------|-------|
| 1 | joe | min | abc | 123 |
| 2 | joe | min | abc | 123 |
| 3 | mar | kam | def | 789 |
| 4 | kel | smi | ghi | 456 |
+------------------------------+
I would like to end up with a table like this
+---+-----+-----+------+-------+
|id |fname|lname|mytext|morevar|
|---|-----|-----|------|-------|
| 1 | joe | min | abc | 123 |
| 3 | mar | kam | def | 789 |
| 4 | kel | smi | ghi | 456 |
+------------------------------+
update forgot to mention this is on amazon RDS using mysql workbench
my table is very large and i keep getting an error Error Code: 1205. Lock wait timeout exceeded from this sql command:
DELETE n1 FROM names n1, names n2 WHERE n1.id > n2.id AND n1.name = n2.name
Also, if anyone else is having issues with MySQL workbench timing out the fix is
Go to Preferences -> SQL Editor and set to a bigger value this parameter:
DBMS connection read time out (in seconds)

OPTION #1: Delete all duplicates records leaving one of each (e.g. the one with max(id))
DELETE
FROM yourTable
WHERE id NOT IN
(
SELECT MAX(id)
FROM yourTable
GROUP BY mytext
)
You could prefer using min(id).
Depending on the engine used, this won't work and, as it did, give you the Error Code: 1093. You can't specify target table 'yourTable' for update in FROM clause. Why? Because deleting one record may cause something to happen which made the WHERE condition FALSE, i.e. max(id) changes the value.
In this case, you could try using another subquery as a temporary table:
DELETE
FROM yourTable
WHERE id NOT IN
(
SELECT MAXID FROM
(
SELECT MAX(id) as MAXID
FROM yourTable
GROUP BY mytext
) as temp_table
)
OPTION #2: Use a temporary table like in this example or:
First, create a temp table with the max ids:
SELECT MAX(id) AS MAXID
INTO tmpTable
FROM yourTable
GROUP BY mytext;
Then execute the delete:
DELETE
FROM yourTable
WHERE id NOT IN
(
SELECT MAXID FROM tmpTable
);

How about this it will delete all the duplicate records from the table
DELETE t1 FROM foobar t1 , foobar t2 WHERE t1 .mytext= t2.mytext

How to avoid "Using temporary" in many-to-many queries?

This query is very simple, all I want to do, is get all the articles in given category ordered by last_updated field:
SELECT
`articles`.*
FROM
`articles`,
`articles_to_categories`
WHERE
`articles`.`id` = `articles_to_categories`.`article_id`
AND `articles_to_categories`.`category_id` = 1
ORDER BY `articles`.`last_updated` DESC
LIMIT 0, 20;
But it runs very slow. Here is what EXPLAIN said:
select_type table type possible_keys key key_len ref rows Extra
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SIMPLE articles_to_categories ref article_id,category_id article_id 5 const 5016 Using where; Using temporary; Using filesort
SIMPLE articles eq_ref PRIMARY PRIMARY 4 articles_to_categories.article_id 1
Is there a way to rewrite this query or add additional logic to my PHP scripts to avoid Using temporary; Using filesort and speed thing up?
The table structure:
*articles*
id | title | content | last_updated
*articles_to_categories*
article_id | category_id
UPDATE
I have last_updated indexed. I guess my situation is explained in documentation:
In some cases, MySQL cannot use
indexes to resolve the ORDER BY,
although it still uses indexes to find
the rows that match the WHERE clause.
These cases include the following:
The key used to fetch the rows is not the same as the one used in the ORDER BY:
SELECT * FROM t1 WHERE key2=constant ORDER BY key1;
You are joining many tables, and the
columns in the ORDER BY are not all
from the first nonconstant table that
is used to retrieve rows. (This is the
first table in the EXPLAIN output that
does not have a const join type.)
but I still have no idea how to fix this.

Here's a simplified example I did for a similar performance related question sometime ago that takes advantage of innodb clustered primary key indexes (obviously only available with innodb !!)
http://dev.mysql.com/doc/refman/5.0/en/innodb-index-types.html
http://www.xaprb.com/blog/2006/07/04/how-to-exploit-mysql-index-optimizations/
You have 3 tables: category, product and product_category as follows:
drop table if exists product;
create table product
(
prod_id int unsigned not null auto_increment primary key,
name varchar(255) not null unique
)
engine = innodb;
drop table if exists category;
create table category
(
cat_id mediumint unsigned not null auto_increment primary key,
name varchar(255) not null unique
)
engine = innodb;
drop table if exists product_category;
create table product_category
(
cat_id mediumint unsigned not null,
prod_id int unsigned not null,
primary key (cat_id, prod_id) -- **note the clustered composite index** !!
)
engine = innodb;
The most import thing is the order of the product_catgeory clustered composite primary key as typical queries for this scenario always lead by cat_id = x or cat_id in (x,y,z...).
We have 500K categories, 1 million products and 125 million product categories.
select count(*) from category;
+----------+
| count(*) |
+----------+
| 500000 |
+----------+
select count(*) from product;
+----------+
| count(*) |
+----------+
| 1000000 |
+----------+
select count(*) from product_category;
+-----------+
| count(*) |
+-----------+
| 125611877 |
+-----------+
So let's see how this schema performs for a query similar to yours. All queries are run cold (after mysql restart) with empty buffers and no query caching.
select
p.*
from
product p
inner join product_category pc on
pc.cat_id = 4104 and pc.prod_id = p.prod_id
order by
p.prod_id desc -- sry dont a date field in this sample table - wont make any difference though
limit 20;
+---------+----------------+
| prod_id | name |
+---------+----------------+
| 993561 | Product 993561 |
| 991215 | Product 991215 |
| 989222 | Product 989222 |
| 986589 | Product 986589 |
| 983593 | Product 983593 |
| 982507 | Product 982507 |
| 981505 | Product 981505 |
| 981320 | Product 981320 |
| 978576 | Product 978576 |
| 973428 | Product 973428 |
| 959384 | Product 959384 |
| 954829 | Product 954829 |
| 953369 | Product 953369 |
| 951891 | Product 951891 |
| 949413 | Product 949413 |
| 947855 | Product 947855 |
| 947080 | Product 947080 |
| 945115 | Product 945115 |
| 943833 | Product 943833 |
| 942309 | Product 942309 |
+---------+----------------+
20 rows in set (0.70 sec)
explain
select
p.*
from
product p
inner join product_category pc on
pc.cat_id = 4104 and pc.prod_id = p.prod_id
order by
p.prod_id desc -- sry dont a date field in this sample table - wont make any diference though
limit 20;
+----+-------------+-------+--------+---------------+---------+---------+------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------+---------+---------+------------------+------+----------------------------------------------+
| 1 | SIMPLE | pc | ref | PRIMARY | PRIMARY | 3 | const | 499 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | p | eq_ref | PRIMARY | PRIMARY | 4 | vl_db.pc.prod_id | 1 | |
+----+-------------+-------+--------+---------------+---------+---------+------------------+------+----------------------------------------------+
2 rows in set (0.00 sec)
So that's 0.70 seconds cold - ouch.
Hope this helps :)
EDIT
Having just read your reply to my comment above it seems you have one of two choices to make:
create table articles_to_categories
(
article_id int unsigned not null,
category_id mediumint unsigned not null,
primary key(article_id, category_id), -- good for queries that lead with article_id = x
key (category_id)
)
engine=innodb;
or.
create table categories_to_articles
(
article_id int unsigned not null,
category_id mediumint unsigned not null,
primary key(category_id, article_id), -- good for queries that lead with category_id = x
key (article_id)
)
engine=innodb;
depends on your typical queries as to how you define your clustered PK.

You should be able to avoid filesort by adding a key on articles.last_updated. MySQL needs the filesort for the ORDER BY operation, but can do it without filesort as long as you order by an indexed column (with some limitations).
For much more info, see here: http://dev.mysql.com/doc/refman/5.0/en/order-by-optimization.html

I assume you have made the following in your db:
1) articles -> id is a primary key
2) articles_to_categories -> article_id is a foreign key of articles -> id
3) you can create index on category_id

ALTER TABLE articles ADD INDEX (last_updated);
ALTER TABLE articles_to_categories ADD INDEX (article_id);
should do it. The right plan is to find the first few records using the first index and do the JOIN using the second one. If it doesn't work, try STRAIGHT_JOIN or something to enforce proper index usage.

Finding mySQL duplicates, then merging data

I have a mySQL database with a tad under 2 million rows. The database is non-interactive, so efficiency isn't key.
The (simplified) structure I have is:
`id` int(11) NOT NULL auto_increment
`category` varchar(64) NOT NULL
`productListing` varchar(256) NOT NULL
Now the problem I would like to solve is, I want to find duplicates on productListing field, merge the data on the category field into a single result - deleting the duplicates.
So given the following data:
+----+-----------+---------------------------+
| id | category | productListing |
+----+-----------+---------------------------+
| 1 | Category1 | productGroup1 |
| 2 | Category2 | productGroup1 |
| 3 | Category3 | anotherGroup9 |
+----+-----------+---------------------------+
What I want to end up is with:
+----+----------------------+---------------------------+
| id | category | productListing |
+----+----------------------+---------------------------+
| 1 | Category1,Category2 | productGroup1 |
| 3 | Category3 | anotherGroup9 |
+----+----------------------+---------------------------+
What's the most efficient way to do this either in pure mySQL query or php?

I think you're looking for GROUP_CONCAT:
SELECT GROUP_CONCAT(category), productListing
FROM YourTable
GROUP BY productListing
I would create a new table, inserting the updated values, delete the old one and rename the new table to the old one's name:
CREATE TABLE new_YourTable SELECT GROUP_CONCAT(...;
DROP TABLE YourTable;
RENAME TABLE new_YourTable TO YourTable;
-- don't forget to add triggers, indexes, foreign keys, etc. to new table

SELECT MIN(id), GROUP_CONCAT(category SEPARATOR ',' ORDER BY id), productListing
FROM mytable
GROUP BY
productListing

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Why this MySQL query is faster without index? - php

Related

MySQL fulltext index query does not use any index?

Returning values even the result are empty

delete duplicate rows that have blob text / mediumtext mysql

How to avoid "Using temporary" in many-to-many queries?

Finding mySQL duplicates, then merging data

Categories

Resources