I am making a PHP backend API which executes a query on MySQL database. This is the query:
SELECT * FROM $TABLE_GAMES WHERE
($GAME_RECEIVERID = '$userId'OR $GAME_OTHERID = '$userId')
ORDER BY $GAME_ID LIMIT 1"
Essentially, I'm passing $userId as parameter, and getting row with smallest $GAME_ID value and it would return result in less than 100 ms for users that have around 30 000 matching rows in table. However, I have since added new users, that have around <100 matching rows, and query is painfully slow for them, taking around 20-30 seconds every time.
I'm puzzled to why the query is so much slower in situations where it is supposed to return low amount of rows, and extremely fast when returns huge amount of rows especially since I have ORDER BY.
I have read about parameter sniffing, but as far as I know, that's the SQL Server thing, and I'm using MySQL.
EDIT
Here is the SHOW CREATE statement:
CREATE TABLEgames(
IDint(11) NOT NULL AUTO_INCREMENT,
SenderIDint(11) NOT NULL,
ReceiverIDint(11) NOT NULL,
OtherIDint(11) NOT NULL,
Timestamptimestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (ID)
) ENGINE=MyISAM AUTO_INCREMENT=17275279 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
Here is the output of EXPLAIN
+----+-------------+-------+------+---------------+------+---------+-----+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | extra |
+----+-------------+-------+------+---------------+------+---------+-----+------+-------+
| 1 | SIMPLE | games | NULL | index | NULL | PRIMARY | 4 | NULL | 1 |
+----+-------------+-------+------+---------------+------+---------+-----+------+-------+
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE games NULL index NULL PRIMARY 4 NULL 1 19.00 Using where
I tried prepared statement, but still getting the same result.
Sorry for poor formatting, I'm still noob at this.
You need to use EXPLAIN to analyse the performance of the query.
i.e.
EXPLAIN SELECT * FROM $TABLE_GAMES WHERE
($GAME_RECEIVERID = '$userId'OR $GAME_OTHERID = '$userId')
ORDER BY $GAME_ID LIMIT 1"
The EXPLAIN would provide the information about the select query with execution plan.
It is great tool to identify the slowness in the query. Based on the obtained information you can create the Indexes for the columns used in WHERE clause .
CREATE INDEX index_name ON table_name (column_list)
This would definitely increase the performance of the query.
Your query is being slow because it cannot find a matching record fast enough. With users where a lot of rows match, chances of finding a record to return are much higher, all other things being equal.
That behavior appears when $GAME_RECEIVERID and $GAME_OTHERID aren't part of an index, prompting MySQL to use the index on $GAME_ID because of the ordering. However, since newer players have not played the early games, there are literally millions of rows that won't match, but have to be checked nonetheless.
Unfortunately, this is bound to get worse even for old users, as your database grows. Ideally, you will add indexes on $GAME_RECEIVERID and $GAME_OTHERID - something like:
ALTER TABLE games
ADD INDEX receiver (ReceiverID),
ADD INDEX other (OtherID)
PS: Altering a 17 million rows table is going to take a while, so make sure to do it during a maintenance window or similar if this is used in production.
Is this the query after the interpolation? That is, is this what MySQL will see?
SELECT * FROM GAMES
WHERE RECEIVERID = '123'
OR OTHERID = '123'
ORDER BY ID LIMIT 1
Then this will run fast, regardless:
SELECT *
FROM GAMES
WHERE ID = LEAST(
( SELECT MIN(ID) FROM GAMES WHERE RECEIVERID = '123' ),
( SELECT MIN(ID) FROM GAMES WHERE OTHERID = '123' )
);
But, you will need both of these:
INDEX(RECEIVERID, ID),
INDEX(OTHERID, ID)
Your version of the query is scanning the table until it finds a matching row. My version will
make two indexed lookups;
fetch the other columns for the one row.
It will be the same, fast, speed regardless of how many rows there are for USERID.
(Recommend switching to InnoDB.)
Related
I have Db of 100,000 users in MYSQL. In that DB i am having column ID,username,Fname,Lname, etc..
When www.example.com/Jim or www.example.com/123 (Where JIM is username and 123 is ID in the users table)
I am using MYSQL query : select * from users where ID = 123 OR username = Jim
I am executing above query in PHP.
Output of the above query is :
| ID | Username | fname | lname |
+----+----------+--------+---------+
|123 | jim | Jim | Jonson |
My Problem is its taking huge time to select username or ID in the DB.
I have used following query
SELECT * FROMusersUSE INDEX (UsersIndexId) where id=123
Is this right way to call Index ?
EXPLAIN SELECT * FROM `users` WHERE ID =327
OP
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE users Const PRIMARY,UsersIndexId PRIMARY 4 const
1
I sugest you take a look at this: How MySQL Uses Indexes
Quoting from the first paragraph:
Indexes are used to find rows with specific column values quickly.
Without an index, MySQL must begin with the first row and then read
through the entire table to find the relevant rows. The larger the
table, the more this costs. If the table has an index for the columns
in question, MySQL can quickly determine the position to seek to in
the middle of the data file without having to look at all the data. If
a table has 1,000 rows, this is at least 100 times faster than reading
sequentially.
That should help speed up your search.
(Edit: Updated the link to a newer version of the SQL docs)
PS: More specifically, column indexes might be what you want.
You can find more info about adding indexes here: Create Index Syntax
To complete #Kjartan answer, you can try the following :
ALTER TABLE users ADD INDEX id_i (`ID`);
ALTER TABLE users ADD INDEX username_i (`Username`);
Your queries should be faster.
My table
Field Type Null Key Default Extra
id int(11) NO PRI NULL auto_increment
userid int(11) NO MUL NULL
title varchar(50) YES NULL
hosting varchar(10) YES NULL
zipcode varchar(5) YES NULL
lat varchar(20) YES NULL
long varchar(20) YES NULL
msg varchar(1000)YES MUL NULL
time datetime NO NULL
That is the table. I have simulated 500k rows of data and deleted randomly 270k rows to leave only 230k with an auto increment of 500k.
Here are my indexs
Keyname Type Unique Packed Field Cardinality Collation Null
PRIMARY BTREE Yes No id 232377 A
info BTREE No No userid 2003 A
lat 25819 A YES
long 25819 A YES
title 25819 A YES
time 25819 A
With that in mind , here is my query:
SELECT * FROM posts WHERE long>-118.13902802886 AND long<-118.08130797114 AND lat>33.79987197114 AND lat<33.85759202886 ORDER BY id ASC LIMIT 0, 25
Showing rows 0 - 15 (16 total, Query took 1.5655 sec) [id: 32846 - 540342]
The query only brought me 1 page, but because it had to search all 230k records it still took 1.5 seconds.
Here is the query explained:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE posts index NULL PRIMARY 4 NULL 25 Using where
So even if i use where clauses to only get back 16 results I still get a slow query.
Now for example if i do a broader search :
SELECT * FROM `posts` WHERE `long`>-118.2544681443 AND `long`<-117.9658678557 AND `lat`>33.6844318557 AND `lat`<33.9730321443 ORDER BY id ASC LIMIT 0, 25
Showing rows 0 - 24 (25 total, Query took 0.0849 sec) [id: 691 - 29818]
It is much faster when retrieving the first page out of 20 pages and 483 found total but i limit to 25.
but if i ask for the last page
SELECT * FROM `posts` WHERE `long`>-118.2544681443 AND `long`<-117.9658678557 AND `lat`>33.6844318557 AND `lat`<33.9730321443 ORDER BY id ASC LIMIT 475, 25
Showing rows 0 - 7 (8 total, Query took 1.5874 sec) [id: 553198 - 559593]
I get a slow query.
My question is how do I achieve good pagination? When the website goes live I expect when it takes off that posts will be deleted and made daily by the hundreds.
Posts should be ordered by id or timestamp and Id is not sequential because some records will be deleted.
I want to have a standard pagination
1 2 3 4 5 6 7 8 ... [Last Page]
Filter from your results records which appeared on earlier pages by using a WHERE clause: then you do not need to specify an offset, only a row count. For example, keep track of the last id or timestamp seen and filter for only those records with id or timestamp greater than that.
unfortunately mysql has to read [and earlier sort] all the 20000 rows before it outputs your 30 results. if you can try narrowing down your search using filtering on indexed columns within WHERE clause.
Few remarks.
given that you order by id, it means that on each page you have id for first and last record, so rather than limit 200000, you should use where id > $last_id limit 20 and that would be blazingly fast.
drawback is obviously that you cannot offer "last" page or any page in between, if id's are not sequential (deleted in between). you may then use combination of the last known id and offset + limit combination.
and obviously, having proper indexes will also help sorting and limiting.
it looks like you only have a primary key index. you might want to define an index on the fields you use, such as:
create index idx_posts_id on posts (`id` ASC);
create index idx_posts_id_timestamp on posts (`id` ASC, `timestamp` ASC);
having a regular index on your key field, besides your primary unique key index, usually helps speed up mysql, by, A LOT.
Mysql loses quite a bit of performance with a large offset: from the mysqlPerformance blog:
Beware of large LIMIT Using index to sort is efficient if you need first few rows, even if some extra filtering takes place so you need to scan more rows by index then requested by LIMIT. However if you’re dealing with LIMIT query with large offset efficiency will suffer. LIMIT 1000,10 is likely to be way slower than LIMIT 0,10. It is true most users will not go further than 10 page in results, however Search Engine Bots may very well do so. I’ve seen bots looking at 200+ page in my projects. Also for many web sites failing to take care of this provides very easy task to launch a DOS attack – request page with some large number from few connections and it is enough. If you do not do anything else make sure you block requests with too large page numbers.
For some cases, for example if results are static it may make sense to precompute results so you can query them for positions.
So instead of query with LIMIT 1000,10 you will have WHERE position between 1000 and 1009 which has same efficiency for any position (as long as it is indexed)
If you are using AUTO INCREMENT you may use:
SELECT *
FROMposts
WHEREid>= 200000 ORDER BYidDESC
LIMIT 200000 , 30
This way mysql will have to traverse only rows above 200000.
I figured it out. What was slowing me down is order by. Since I would call a limit and the the further down I asked to go the more it had to sort. So then i fixed it by adding a subquery to first extract the data I want with WERE clause then I used ORDER BY and LIMIT
SELECT * FROM
(SELECT * from `posts` as `p`
WHERE
`p`.`long`>-119.2544681443
AND `p`.`long`<-117.9658678557
AND `p`.`lat`>32.6844318557 A
ND `p`.`lat`<34.9730321443
) as posttable
order by id desc
limit x,n
By doing that I achieved the following:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 3031 Using filesort
2 DERIVED p ALL NULL NULL NULL NULL 232377 Using where
Now I filter 232k results using "where" and only orderby and limit 3031 results.
Showing rows 0 - 3030 (3,031 total, Query took 0.1431 sec)
I have a script to find duplicate rows in my MySql table, the table contains 40,000,000 rows. but it is very slow going, is there an easier way to find the duplicate records without going in and out of php?
This is the script i currently use
$find = mysql_query("SELECT * FROM pst_nw ID < '1000'");
while ($row = mysql_fetch_assoc($find))
{
$find_1 = mysql_query("SELECT * FROM pst_nw add1 = '$row[add1]' AND add2 = '$row[add2]' AND add3 = '$row[add3]' AND add4 = '$row[add4]'");
if (mysql_num_rows($find_1) > 0) {
mysql_query("DELETE FROM pst_nw WHERE ID ='$row[ID]'}
}
You have a number of options.
Let the DB do the work
Create a copy of your table with a unique index - and then insert the data into it from your source table:
CREATE TABLE clean LIKE pst_nw;
ALTER IGNORE TABLE clean ADD UNIQUE INDEX (add1, add2, add3, add4);
INSERT IGNORE INTO clean SELECT * FROM pst_nw;
DROP TABLE pst_nw;
RENAME TABLE clean pst_nw;
The advantage of doing things this way is you can verify that your new table is correct before dropping your source table. The disadvantage is it takes up twice as much space and is (relatively) slow to execute.
Let the DB do the work #2
You can also achieve the result you want by doing:
set session old_alter_table=1;
ALTER IGNORE TABLE pst_nw ADD UNIQUE INDEX (add1, add2, add3, add4);
The first command is required as a workaround for the ignore flag being .. ignored
The advantage here is there's no messing about with a temporary table - the disadvantage is you don't get to check that your update does exactly what you expect before you run it.
Example:
CREATE TABLE `foo` (
`id` int(10) NOT NULL AUTO_INCREMENT,
`one` int(10) DEFAULT NULL,
`two` int(10) DEFAULT NULL,
PRIMARY KEY (`id`)
)
insert into foo values (null, 1, 1);
insert into foo values (null, 1, 1);
insert into foo values (null, 1, 1);
select * from foo;
+----+------+------+
| id | one | two |
+----+------+------+
| 1 | 1 | 1 |
| 2 | 1 | 1 |
| 3 | 1 | 1 |
+----+------+------+
3 row in set (0.00 sec)
set session old_alter_table=1;
ALTER IGNORE TABLE foo ADD UNIQUE INDEX (one, two);
select * from foo;
+----+------+------+
| id | one | two |
+----+------+------+
| 1 | 1 | 1 |
+----+------+------+
1 row in set (0.00 sec)
Don't do this kind of thing outside the DB
Especially with 40 million rows doing something like this outside the db is likely to take a huge amount of time, and may not complete at all. Any solution that stays in the db will be faster, and more robust.
Usually in questions like this the problem is "I have duplicate rows, want to keep only one row, any one".
But judging from the code, what you want is: "if a set of add1, add2, add3, add4 is duplicated, DELETE ALL COPIES WITH ID < 1000". In this case, copying from the table to another with INSERT IGNORE won't do what you want - might even keep rows with lower IDs and discard subsequent ones.
I believe you need to run something like this to gather all the "bad IDs" (IDs with a duplicate, the duplicate above 1000; in this code I used "AND bad.ID < good.ID", so if you have ID 777 which duplicates to ID 888, ID 777 will still get deleted. If this is not what you want, you can modify that in "AND bad.ID < 1000 AND good.ID > 1000" or something like that).
CREATE TABLE bad_ids AS
SELECT bad.ID FROM pst_nw AS bad JOIN pst_nw AS good
ON ( bad.ID < 1000 AND bad.ID < good.ID
AND bad.add1 = good.add1
AND bad.add2 = good.add2
AND bad.add3 = good.add3
AND bad.add4 = good.add4 );
Then once you have all bad IDs into a table,
DELETE pst_nw.* FROM pst_nw JOIN bad_ids ON (pst_nw.ID = bad_ids.ID);
Performances will greatly benefit from a (non_unique, possibly only temporary) index on add1, add2, add3, add4 and ID in this order.
Get the duplicate rows using "Group by" operator. Here is a sample that you can try :
select id
from table
group by matching_field1,matching_field2....
having count(id) > 1
So, you are getting all the duplicate ids. Now delete them using a delete query.
Instead of using "IN", use "OR" operator as "IN" is slow compared to "OR".
Sure there is. Note however that with 40 million records You most probably will exceed max php execution time. Try following
Create table temp_pst_nw like pst_nw;
Insert into temp_pst_nw select * from pst_nw group by add1,add2,add3,add4;
Confirm that everything is ok first!!
Drop table pat_nw;
Rename table temp_pst_nw to pst_nw;
Try creating a new table that has the same definitions. i.e. "my_table_two", then do:
SELECT DISTINCT unique_col1, col2, col3 [...] FROM my_table INTO
my_table_two;
Maybe that'll sort it out.
Your code will be better if you don't use select *, only select columns (4 address) you want to compare. It should have limit clause in my sql. It can avoid state not response when you have too large nums rows like that.
In situations like this which method or mix of methods performs the quickest?
$year = db_get_fields("select distinct year from car_cache order by year desc");
Or
$year = db_get_fields("select year from car_cache");
$year = array_unique($year);
sort($year);
I've heard the distinct on mysql is a real big performance hit for large queries and this table can have a million rows or more. I wondered what combination of database types, Innodb or MyISAM, would work best too. I know many optimizations are very query dependent. Year is an unsigned number, but other fields are varchar of different lengths I know that may make a difference too. Such as:
$line = db_get_fields("select distinct line from car_cache where year='$postyear' and make='$postmake' order by line desc");
I read that using the new innodb multiple keys method can make queries like this one very very quick. But the distinct and order by clauses are red flags to me.
Have MySQL do as much work as possible. If it isn't being efficient at what its doing, then things likely aren't set up correctly (whether it is proper indexing for the query you are trying to run, or settings with sort buffers).
If you have an index on the year column, then using DISTINCT should be efficient. If you do not, then a full table scan is necessary in order to fetch the distinct rows. If you try to sort out the distinct rows in PHP rather than MySQL, then you transmit (potentially) much more data from MySQL to PHP, and PHP consumes much more memory to store all that data before eliminating the duplicates.
Here is some sample output from a dev database I have. Also note that this database is on a different server on the network from where the queries are being executed.
SELECT COUNT(SerialNumber) FROM `readings`;
> 97698592
SELECT SQL_NO_CACHE DISTINCT `SerialNumber`
FROM `readings`
ORDER BY `SerialNumber` DESC
LIMIT 10000;
> Fetched 10000 records. Duration: 0.801 sec, fetched in: 0.082 sec
> EXPLAIN *above_query*
+----+-------------+----------+-------+---------------+---------+---------+------+------+-----------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+-------+---------------+---------+---------+------+------+-----------------------------------------------------------+
| 1 | SIMPLE | readings | range | NULL | PRIMARY | 18 | NULL | 19 | Using index for group-by; Using temporary; Using filesort |
+----+-------------+----------+-------+---------------+---------+---------+------+------+-----------------------------------------------------------+
If I attempt the same query, except replace the SerialNumber column with one that is non-indexed, then it takes forever to run because MySQL has to examine all 97 million rows.
Some of the efficiency has to do with how much data you expect to get back. If I slightly modify the above queries to operate on the time column (the timestamp of the reading), then it takes 1 min 40 seconds to get a distinct list of 273,505 times, most of the overhead there is in transferring all the records over the network. So keep in mind the limits on how much data you are getting back, you want to keep that as low as possible for the data you are trying to fetch.
As for your final query:
select distinct line from car_cache
where year='$postyear' and make='$postmake'
order by line desc
There should be no problem with that either, just make sure you have a compound index on year and make and possibly an index on line.
On a final note, the engine I am using for the readings table is InnoDB, and my server is: 5.5.23-55-log Percona Server (GPL), Release 25.3 which is a version of MySQL by Percona Inc.
Hope that helps.
I'm having approx. 200K rows in a table tb_post, and every 5 minutes it has approx. 10 new inserts.
I'm using following query to fetch the rows -
SELECT tb_post.ID, tb_post.USER_ID, tb_post.TEXT, tb_post.RATING, tb_post.CREATED_AT,
tb_user.ID, tb_user.NAME
FROM tb_post, tb_user
WHERE tb_post.USER_ID=tb_user.ID
ORDER BY tb_post.RATING DESC
LIMIT 30
It's taking more than 10sec to fetch all the rows in sorted fashion.
Following is the report of EXPLAIN query:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE tb_user ALL PRIMARY NULL NULL NULL 20950 Using temporary; Using filesort
1 SIMPLE tb_post ref tb_post_FI_1 tb_post_FI_1 4 tb_user.id 4
Few inputs:
tb_post.RATING is Float type
There is index on tb_post.USER_ID
Can anyone suggest me few pointers about how should I optimize this query and improve its read performance?
PS: I'm newbie in database scaling issues. So any kinds of suggestions will be useful specific to this query.
You need an index for tb_post that covers both the ORDER BY and WHERE clause.
CREATE INDEX idx2 on tb_post (rating,user_id)
=> output of EXPLAIN SELECT ...ORDER BY tb_post.RATING DESC LIMIT 30
"id";"select_type";"table";"type";"possible_keys";"key";"key_len";"ref";"rows";"Extra"
"1";"SIMPLE";"tb_post";"index";NULL;"idx2";"10";NULL;"352";""
"1";"SIMPLE";"tb_user";"eq_ref";"PRIMARY";"PRIMARY";"4";"test.tb_post.USER_ID";"1";""
You could try to index tb_post.RATING: MySQL can sometimes use indexes to optimize ORDER BY clauses : http://dev.mysql.com/doc/refman/5.0/en/order-by-optimization.html
If you're trying to aggregate data from different tables, you could also check which type of join ( http://en.wikipedia.org/wiki/Join_(SQL) ) you want. Some are better than others, depending on what you want.
What happens if you take the ORDER BY off, does that have a performance impact? If that has a large effect then maybe consider indexing tb_post.RATING.
Karl