Prepared Statement does not use expected index - php

I have a very large table of IOT sample that I'm trying to run a relativly simple query against. Running the query normally using the MySql CLI returns a result in ~0.07 seconds. If I first prepare the query either via PDO or by running a SQL PREPARE statement then the request takes over a minute.
I've enabled the the optimizer trace feature, and it looks like when the statement is prepared, MySql ignores the index that it should use and does a file sort of the whole table. I'd like any insight if I am doing something wrong or if this looks like a MySql bug.
The table itself contains over 100 million samples, and at least 300 thousand are associated with the device being queried here. I ran these tests with MySql 8.0.23, but when I upgraded to 8.0.25 the issues persisted.
Table definition (some data rows ommited)
Create Table: CREATE TABLE `samples` (
`id` bigint unsigned NOT NULL AUTO_INCREMENT,
`organization_id` int unsigned NOT NULL,
`device_id` int unsigned NOT NULL,
`created_at` timestamp NULL DEFAULT NULL,
`raw_reading` int DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `samples_organization_id_foreign` (`organization_id`),
KEY `samples_reverse_device_id_created_at_organization_id_index` (`device_id`,`created_at` DESC,`organization_id`),
CONSTRAINT `samples_device_id_foreign` FOREIGN KEY (`device_id`) REFERENCES `devices` (`id`) ON DELETE RESTRICT ON UPDATE CASCADE,
CONSTRAINT `samples_organization_id_foreign` FOREIGN KEY (`organization_id`) REFERENCES `organizations` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=188315314 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
Sql That runs in < 1s
select *
from `samples`
where `samples`.`device_id` = 5852
and `samples`.`device_id` is not null
and `id` != 188315308
order by `created_at` desc
limit 1;
Sql That runs in over a minute
prepare test_prep from 'select * from `samples` where `samples`.`device_id` = ? and `samples`.`device_id` is not null and `id` != ? order by `created_at` desc limit 1';
set #a = 5852;
set #b = 188315308;
execute test_prep using #a, #b;
Trace for the non prepared SQL can be found at my gist, but the relevant part is
{
"reconsidering_access_paths_for_index_ordering": {
"clause": "ORDER BY",
"steps": [
],
"index_order_summary": {
"table": "`samples`",
"index_provides_order": true,
"order_direction": "asc",
"index": "samples_reverse_device_id_created_at_organization_id_index",
"plan_changed": false
}
}
},
Trace for the prepared query can be found at my other gist, but the relevant part is
{
"reconsidering_access_paths_for_index_ordering": {
"clause": "ORDER BY",
"steps": [
],
"index_order_summary": {
"table": "`samples`",
"index_provides_order": false,
"order_direction": "undefined",
"index": "samples_reverse_device_id_created_at_organization_id_index",
"plan_changed": false
}
}
},

The index you want to use is not that bad:
`samples_reverse_device_id_created_at_organization_id_index`
(`device_id`,`created_at` DESC,`organization_id`)
However, is not a covering index. If the query performance is really important, I would add an index that covers the filtering predicate at least. Your don't need a real covering index since you are retrieving all columns. I would try:
create index ix1 on samples (device_id, created_at, id);
EDIT
Another trick that could promote the index usage is to delay the predicate id != 188315308 as much as possible. If you know that this predicate will be matched by at least one row in the first 100 rows produced by the rest of the predicates you can try rephrasing your query as:
select *
from (
select *
from `samples`
where `samples`.`device_id` = 5852
order by `created_at` desc
limit 100
) x
where `id` != 188315308
order by `created_at` desc
limit 1

Get rid of this, since the = 5852 assures that it will be false:
and `samples`.`device_id` is not null
Then your index, or this one, should work fine.
INDEX(device_id, created_at, id)
Do not use #variables; the Optimizer seems to not look at the value they contain. That is, instead of
set #a = 5852;
set #b = 188315308;
execute test_prep using #a, #b;
Simply do
execute test_prep using 5852, 188315308;
Consider writing a bug report at bugs.mysql.com
I suspect "order_direction": "undefined" is part of the problem.

Not full solution, but a workaround. I added an index on just my timestamp and that seems to satisfy the optimizer.
KEY `samples_created_at_index` (`created_at` DESC),
I'm going to try to clean up a minimal test case and post it over on MySql bugs. I'll add a followup here if anything comes of that.

Related

Yii2 ActiveQuery join keep returns distinct values

I have two tables as below
table halte :
CREATE TABLE `halte` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`nama` varchar(255) NOT NULL,
`lat` float(10,6) DEFAULT NULL,
`lng` float(10,6) DEFAULT NULL,
PRIMARY KEY (`id`)
)
table stops :
CREATE TABLE `stops` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`id_halte` int(11) DEFAULT NULL,
`sequence` int(2) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `id_halte` (`id_halte`)
)
I also have some other tables which don't cause any problems.
Halte table has many to one relation to stop. The problem is when i try to get rows from halte table using right join to table stops, Yii only returns unique rows. Yii won't return same halte's row more once even stop table has more than one record related to same row in halte table.
Here's my code
$haltes = $modelHalte->find()
->rightJoin('stops', 'halte.id = stops.id_halte')
->where(['stops.id_rute'=>Yii::$app->request->get('rute')])
->orderBy('sequence')
->all();
I have tried distinct(false) but no result.
I've also check debugger and it run right query i want :
SELECT `halte`.* FROM `halte` RIGHT JOIN `stops` ON halte.id = stops.id_halte WHERE `stops`.`id_rute`='1' ORDER BY `sequence`
I tried to run that query manually and it returned 29 rows which is what what i want. But in Yii, it only returned 27 rows because 2 rows is same record in halte table.
I know i can achieve this using yii\db\Query, but i want to use ActiveRecord.
Are there any way to work around this?
I would really appreciate your opinion/help.
Thanks.
Check the sql command generated by you active query
$haltes = $modelHalte->find()
->rightJoin('stops', 'halte.id = stops.id_halte')
->where(['stops.id_rute'=>Yii::$app->request->get('rute')])
->orderBy('sequence')
->all();
echo $haltes->createCommand()->sql;
or to get the SQL with all parameters included try:
$haltes->createCommand()->getRawSql();
And compare the code generated by ActiveQuery with your created manually ..

Speed up MySQL Query + PHP

I want to speed up this code. It is the query that takes time. If I change the amount of rows returned from 100 to 10, it takes almost the same amount of time (about 2 seconds). The GETs are based on user sort/search input. How do I improve the speed of this? This item table has about 2374744 rows, and the bot table about 20 rows.
$bot = " && user_items_new.bot_id != '0'";
if ($_GET['bot'] != 0) {
$bot = " && user_items_new.bot_id='".$_GET['bot']."'";
}
$name = '';
if (strlen($_GET['name']) > 0) {
$name = " && user_items_new.name LIKE '%".$_GET['name']."%'";
}
$min = '';
if (strlen($_GET['min']) > 0) {
$min = " && steam_price >= '".$_GET['min']."'";
}
$max = '';
if (strlen($_GET['max']) > 0) {
$max = " && steam_price <= '".$_GET['max']."'";
}
$order = '';
if ($_GET['order'] == 'price_desc') {
$order = "ORDER BY steam_price DESC, user_items_new.name ASC";
} elseif ($_GET['order'] == 'price_asc') {
$order = "ORDER BY steam_price ASC, user_items_new.name ASC";
} elseif ($_GET['order'] == 'name_desc') {
$order = "ORDER BY user_items_new.name DESC";
} else {
$order = "ORDER BY user_items_new.name ASC";
}
$limit = $_GET['start'];
$limit .= ', 100';
$i = 0;
$sql = mysql_query("SELECT user_item_id, user_items_new.bot_id AS item_bot_id, sticker, `key`, `case`, exterior, stattrak, image, user_items_new.name AS item_name, steam_price, color, bots_new.bot_id, bots_new.name AS bot_name, withdraw_enabled FROM user_items_new LEFT JOIN bots_new ON user_items_new.bot_id=bots_new.bot_id WHERE steam_price > '0.1' && deposit_start='0' && deposited='0' && user_id='0' && withdraw_enabled='1' ".$bot." ".$name." ".$min." ".$max." ".$order." LIMIT ".$limit)or die(mysql_error());
while ($item = mysql_fetch_assoc($sql)) {
//...
}
The item table looks like this (dumped from phpMyAdmin):
CREATE TABLE IF NOT EXISTS `user_items_new` (
`user_item_id` int(11) NOT NULL,
`user_id` int(11) NOT NULL,
`bot_id` int(11) NOT NULL,
`item_original_id` varchar(22) NOT NULL,
`item_real_id` varchar(22) NOT NULL,
`class_id` varchar(22) NOT NULL,
`weapon_id` int(11) NOT NULL,
`name` text NOT NULL,
`image` text NOT NULL,
`case` int(11) NOT NULL,
`key` int(11) NOT NULL,
`sticker` int(11) NOT NULL,
`capsule` int(11) NOT NULL,
`holo` int(11) NOT NULL,
`name_tag` int(11) NOT NULL,
`access_pass` int(11) NOT NULL,
`stattrak` int(11) NOT NULL,
`color` varchar(32) NOT NULL,
`exterior` text NOT NULL,
`steam_price` double NOT NULL,
`deposited` int(11) NOT NULL,
`deposit_start` int(11) NOT NULL
) ENGINE=InnoDB AUTO_INCREMENT=5219079 DEFAULT CHARSET=utf8;
ALTER TABLE `user_items_new`
ADD PRIMARY KEY (`user_item_id`), ADD KEY `user_id` (`user_id`), ADD KEY `bot_id` (`bot_id`);
ALTER TABLE `user_items_new`
MODIFY `user_item_id` int(11) NOT NULL AUTO_INCREMENT,AUTO_INCREMENT=5219079;
And then the bot table:
CREATE TABLE IF NOT EXISTS `bots_new` (
`bot_id` int(11) NOT NULL,
`name` varchar(64) NOT NULL,
`username` varchar(64) NOT NULL,
`password` varchar(64) NOT NULL,
`deposit_enabled` int(11) NOT NULL,
`withdraw_enabled` int(11) NOT NULL,
`ident` varchar(32) NOT NULL
) ENGINE=MyISAM AUTO_INCREMENT=19 DEFAULT CHARSET=utf8;
ALTER TABLE `bots_new`
ADD PRIMARY KEY (`bot_id`);
Edit (adding prettyprinted SELECT)
SELECT user_item_id, user_items_new.bot_id AS item_bot_id, sticker,
key, case, exterior, stattrak, image, user_items_new.name AS item_name,
steam_price, color, bots_new.bot_id, bots_new.name AS bot_name,
withdraw_enabled
FROM user_items_new
LEFT JOIN bots_new ON user_items_new.bot_id=bots_new.bot_id
WHERE user_items_new.bot_id != '0' && deposit_start='0' && deposited='0' && user_id='0' && withdraw_enabled='1'
ORDER BY user_items_new.name ASC
LIMIT , 100
How to speed this up...
Firstly, add a composite index on the columns that have predicates with equality comparisons first, e.g.
... ON user_items_new (user_id,deposited,deposit_start)
This will be of benefit if the predicates are filtering out a large number of rows. For example, if less than 10% of the rows satisfy the condition user_id = 0.
As an aside, the predicate withdraw_enabled='1' will negate the "outerness" of the LEFT JOIN. The result from the query will be equivalent if the keyword LEFT is omitted.
Another issue is that the ORDER BY will cause a "Using filesort" operation to sort the rows. The entire set will need to be sorted, before the LIMIT clause is applied. So we don't expect LIMIT 10 to be any faster than LIMIT 1000, apart from the additional time for the client to transfer an additional 990 rows. (The bit about sorting the entire set isn't entirely true; in some cases MySQL can abort the sort operation after identifying the first "limit" number of rows. But MySQL will still need to go through the entire set to get those first rows.)
It's possible that adding the column(s) in the ORDER BY clause to the index, following the columns with equality predicates. These would need to appear immediately following the columns referenced in the equality predicates. It may also be necessary to specify those same columns in the ORDER BY clause.
Assuming the current query includes:
...
WHERE ...
&& deposit_start='0' && u.deposited='0' && u.user_id='0' ...
...
ORDER BY steam_price ASC, user_items_new.name ASC
This index may be appropriate:
... ON user_items_new (user_id,deposited,deposit_start,steam_price,name)
The output from EXPLAIN will show whether that index is used for the query or not. Beyond the equality comparisons of the first three columns, MySQL can use a range scan operation on the index to satisfy the steam_price > predicate.
There's also the issue of the InnoDB buffer pool; how much memory is allocated to holding index and data pages in memory, to avoid storage i/o.
To avoid lookups to data pages in the underlying table, you can consider creating a covering index for the query. A covering index includes all of the columns referenced from the table, so the query can be satisfied entirely from the index. The EXPLAIN output will show "Using index" in the Extra column if the query is using a covering index. (But there are limits to the number of columns and the total row size in the index. This would most benefit the performance of the query when the table rows are large, and the size of the columns in the index is a small subset of the total table row.
With a table of that size, one of the simplest tricks you can use for optimizing the query is to add indexes on the fields you use in the where clause. This allows the parser to have stuff presorted for the queries you use most often.
For example, you should see significant gains by doing:
ALTER TABLE user_items_new ADD INDEX (steam_price);
The data and data type go a long way in determining the actual gains made. Adding indexes on all fields will result in going backwards on the efficiency of the query. So more is not necessarily better.
Your query is slow because your query against the user_items_new table requires inspecting 1.2 million rows. While you have indexes for user_item_id, user_id, and bot_id, those can only filter your results so far.
You will want to add indexes on some of your data columns. Which indexes you will want to add (and whether any of them are compound or not) is going to depend on the actual contents of the table and would be difficult to recommend without more information.
You will want to add indexes based on which columns where distinct values reduce the data that must be looked at significantly; an index on withdraw_enabled, for example, is not likely to gain much unless very few rows have withdraw_enabled == 1. An index on steam_price will be beneficial if very few of your rows have a steam_price >= 0.1.

MySQL INSERT IGNORE Adding 1 to Non-Indexed column

I'm building a small report in a PHP while loop.
The query I'm running inside the while() loop is this:
INSERT IGNORE INTO `tbl_reporting` SET datesubmitted = '2015-05-26', submissiontype = 'email', outcome = 0, totalcount = totalcount+1
I'm expecting the totalcount column to increment every time the query is run.
But the number stays at 1.
The UNIQUE index composes the first 3 columns.
Here's the Table Schema:
CREATE TABLE `tbl_reporting` (
`datesubmitted` date NOT NULL,
`submissiontype` varchar(20) COLLATE utf8mb4_unicode_ci NOT NULL,
`outcome` tinyint(1) unsigned NOT NULL DEFAULT '0',
`totalcount` mediumint(5) unsigned NOT NULL DEFAULT '0',
UNIQUE KEY `datesubmitted` (`datesubmitted`,`submissiontype`,`outcome`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
When I modify the query into a regular UPDATE statement:
UPDATE `tbl_reporting` SET totalcount = totalcount+1 WHERE datesubmitted = '2015-05-26' AND submissiontype = 'email' AND outcome = 1
...it works.
Does INSERT IGNORE not allow adding numbers? Or is my original query malformed?
I'd like to use the INSERT IGNORE, otherwise I'll have to query for the original record first, then insert, then eventually update.
Think of what you're doing:
INSERT .... totalcount=totalcount+1
To calculate totalcount+1, the DB has to retrieve the current value of totalcount... which doesn't exist yet, because you're CREATING a new record, and there is NO existing data to retrieve the "old" value from.
e.g. you're trying eat your cake before you ever went to the store to buy the ingredients, let alone mix/bake them.

counting rows via php is faster than COUNT in SQL?

In short my question is this: Why is this
SELECT r.x, r.y FROM `base` AS r
WHERE r.l=50 AND AND r.n<>'name' AND 6=(SELECT COUNT(*) FROM surround AS d
WHERE d.x >= r.x -1 AND d.x <= r.x +1 AND
d.y>=r.y -1 AND d.y<=r.y +1 AND d.n='name')
a lot slower than this:
$q="SELECT x,y FROM `base` WHERE l=50 AND n<>'name'";
$sr=mysql_query($q);
if(mysql_num_rows($sr)>=1){
while($row=mysql_fetch_assoc($sr)){
$q2="SELECT x,y FROM surround WHERE n='name' AND x<=".
($row["x"]+1)." AND x>=".($row["x"]-1).
" AND y<=".($row["y"]+1)." AND y>=".($row["y"]-1)." ";
$sr2=mysql_query($q2);
if(mysql_num_rows($sr2)=6){
echo $row['x'].','.$row[y].'\n';
}
}
}
The php version takes about 300 ms to complete, if I run the "pure SQL" version, be it via phpadmin or via php, that takes roughly 5 seconds (and even 13 seconds when I used BETWEEN for those ranges of x and y)
I would suspect that the SQL version would in general be faster, and more efficient at least, so I wonder, am I doing something wrong, or does it make sense?
EDIT: I added the structure of both tables, as requested:
CREATE TABLE IF NOT EXISTS `base` (
`bid` int(12) NOT NULL COMMENT 'Base ID',
`n` varchar(25) NOT NULL COMMENT 'Name',
`l` int(3) NOT NULL,
`x` int(3) NOT NULL,
`y` int(3) NOT NULL,
`LastModified` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
UNIQUE KEY `coord` (`x`,`y`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE IF NOT EXISTS `surround` (
`bid` int(12) NOT NULL COMMENT 'Base ID',
`n` varchar(25) NOT NULL COMMENT 'Name',
`l` int(3) NOT NULL,
`x` int(3) NOT NULL,
`y` int(3) NOT NULL,
`LastModified` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
UNIQUE KEY `coord` (`x`,`y`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
EDIT 2:
EXPLAIN SELECT for the query above: (the key coord is the combination of x and y)
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY r range coord,n coord 4 NULL 4998 Using where
2 DEPENDENT SUBQUERY d ALL coord NULL NULL NULL 57241 Range checked for each record (index map: 0x1)
You are joinning two tables by yourself. you're an optimizer. you choice 'base' table is outer table for nested loop join. I guess MySQL's optimizer produced execution plan and it was not same as you.
so people want EXPLAIN output to see join order and to check index was used.
by the way, can you try this query?:
SELECT r.x, r.y
FROM `base` AS r, surround AS d
WHERE r.l=50
AND r.n<>'name'
AND d.x >= r.x -1
AND d.x <= r.x +1
AND d.y>=r.y -1
AND d.y<=r.y +1
AND d.n='name'
GROUP BY r.x, r.y
HAVING COUNT(*) = 6
UPDATED
how your original query works
It was first time seeing Range checked for each record (index map: 0x1) so I can't figure out how your query works. MySQL Manual gives us some information about it. It seems like that every row in surround (surround has 57k rows?) is compare to base's x,y. If so, your query is evaluated using 3 depth nested loop join. (base => surround => base) and moreover every row in surround is compared (this is inefficient)
I will make more effort to find how it works later. It's time to work.

Deleting a row in certain conditions with mysql

I want to automatically delete rows when the table (shown below) gets a new insert, if certain conditions are met.
When:
There are rows referring to the same 'field' with the same 'user_id'
Their 'field', 'display' and 'search' columns are the same
Simply, when the rows would become duplicates (except the 'group_id' column) the non null 'group_id' should be deleted, otherwise a row should be updated or inserted.
Is there a way to set this up in mysql (in spirit of "ON DUPLICATE do stuff" combined with unique keys etc.), or do I have to explicitly check for it in php (with multiple queries)?
Additional info:
There should always be a row with NULL 'group_id' for every possible 'field' (there's a limited set, defined elsewhere). On the other hand there might not be one with a non null 'group_id'.
CREATE TABLE `Views` (
`user_id` SMALLINT(5) UNSIGNED NOT NULL,
`db` ENUM('db_a','db_b') NOT NULL COLLATE 'utf8_swedish_ci',
`field` VARCHAR(40) NOT NULL COLLATE 'utf8_swedish_ci',
`display` TINYINT(1) UNSIGNED NOT NULL,
`search` TINYINT(1) UNSIGNED NOT NULL,
`group_id` SMALLINT(6) UNSIGNED NULL DEFAULT NULL,
UNIQUE INDEX `user_id` (`field`, `db`, `user_id`),
INDEX `Views_ibfk_1` (`user_id`),
INDEX `group_id` (`group_id`),
CONSTRAINT `Views_ibfk_1` FOREIGN KEY (`user_id`) REFERENCES `User` (`id`) ON
UPDATE CASCADE ON DELETE CASCADE
)
COLLATE='utf8_swedish_ci'
ENGINE=InnoDB;
I think you need to revise your logic. It makes no sense to Insert a row only to delete another row. Why not just update the Group_ID field in the duplicate row to what is being inserted? Below is a rough idea of how I would go about it.
N.b. I haven't done much work with MySQL and cannot get the below to run on SQLFiddle, but based on the MySQL docs I can't work out why. Perhaps someone more versed in MySQL can correct me?
SET #User_ID = 1;
SET #db = 'db_a';
SET #Field = 'Field';
SET #Display = 1;
SET #Search = 1;
SET #Group_ID = 1;
IF EXISTS
( SELECT 1
FROM Views
WHERE User_ID = #User_ID
AND DB = #DB
AND Field = #Field
AND Group_ID IS NOT NULL
)
THEN
UPDATE Views
SET Group_ID = #Group_ID,
Display = #Display,
Search = #Search
WHERE User_ID = #User_ID
AND DB = #DB
AND Field = #Field
AND Group_ID IS NOT NULL
ELSE
INSERT INTO Views (User_ID, DB, Field, Display, Search, Group_ID)
VALUES (#User_ID, #DB, #Field, #Display, #Search, #Group_ID)
END IF;
Alternatively (and my preferred solution), add a Timestamp field to your table and create a view as follows:
SELECT v.User_ID, v.DB, v.Field, v.Display, v.Search, v.Group_ID
FROM Views v
INNER JOIN
( SELECT User_ID, DB, Field, MAX(CreatedDate) AS CreatedDate
FROM Views
WHERE Group_ID IS NOT NULL
GROUP BY User_ID, DB, Field
) MaxView
ON MaxView.User_ID = v.User_ID
AND MaxView.DB = v.DB
AND MaxView.Field = v.Field
AND MaxView.CreatedDate = v.CreatedDate
WHERE v.Group_ID IS NOT NULL
UNION ALL
SELECT v.User_ID, v.DB, v.Field, v.Display, v.Search, v.Group_ID
FROM Views v
WHERE v.Group_ID IS NULL
This would allow you to track changes to your data properly, without compromising the need to be able to view unique records.
delete group_id from Views where group_id != 'NUll'
Your question is not very good to understand, so I'm not sure this is what you want:
DELETE FROM Views WHERE # delete from the table views
group_id IS NOT NULL AND # first condition delete only rows with not null group_id
(SELECT count(*) as tot FROM Views GROUP BY group_id) = 1 # second condition count the difference in group id
If that's not what you want, please update your question with more details...

Categories