Speed up MySQL Query + PHP - php

I want to speed up this code. It is the query that takes time. If I change the amount of rows returned from 100 to 10, it takes almost the same amount of time (about 2 seconds). The GETs are based on user sort/search input. How do I improve the speed of this? This item table has about 2374744 rows, and the bot table about 20 rows.
$bot = " && user_items_new.bot_id != '0'";
if ($_GET['bot'] != 0) {
$bot = " && user_items_new.bot_id='".$_GET['bot']."'";
}
$name = '';
if (strlen($_GET['name']) > 0) {
$name = " && user_items_new.name LIKE '%".$_GET['name']."%'";
}
$min = '';
if (strlen($_GET['min']) > 0) {
$min = " && steam_price >= '".$_GET['min']."'";
}
$max = '';
if (strlen($_GET['max']) > 0) {
$max = " && steam_price <= '".$_GET['max']."'";
}
$order = '';
if ($_GET['order'] == 'price_desc') {
$order = "ORDER BY steam_price DESC, user_items_new.name ASC";
} elseif ($_GET['order'] == 'price_asc') {
$order = "ORDER BY steam_price ASC, user_items_new.name ASC";
} elseif ($_GET['order'] == 'name_desc') {
$order = "ORDER BY user_items_new.name DESC";
} else {
$order = "ORDER BY user_items_new.name ASC";
}
$limit = $_GET['start'];
$limit .= ', 100';
$i = 0;
$sql = mysql_query("SELECT user_item_id, user_items_new.bot_id AS item_bot_id, sticker, `key`, `case`, exterior, stattrak, image, user_items_new.name AS item_name, steam_price, color, bots_new.bot_id, bots_new.name AS bot_name, withdraw_enabled FROM user_items_new LEFT JOIN bots_new ON user_items_new.bot_id=bots_new.bot_id WHERE steam_price > '0.1' && deposit_start='0' && deposited='0' && user_id='0' && withdraw_enabled='1' ".$bot." ".$name." ".$min." ".$max." ".$order." LIMIT ".$limit)or die(mysql_error());
while ($item = mysql_fetch_assoc($sql)) {
//...
}
The item table looks like this (dumped from phpMyAdmin):
CREATE TABLE IF NOT EXISTS `user_items_new` (
`user_item_id` int(11) NOT NULL,
`user_id` int(11) NOT NULL,
`bot_id` int(11) NOT NULL,
`item_original_id` varchar(22) NOT NULL,
`item_real_id` varchar(22) NOT NULL,
`class_id` varchar(22) NOT NULL,
`weapon_id` int(11) NOT NULL,
`name` text NOT NULL,
`image` text NOT NULL,
`case` int(11) NOT NULL,
`key` int(11) NOT NULL,
`sticker` int(11) NOT NULL,
`capsule` int(11) NOT NULL,
`holo` int(11) NOT NULL,
`name_tag` int(11) NOT NULL,
`access_pass` int(11) NOT NULL,
`stattrak` int(11) NOT NULL,
`color` varchar(32) NOT NULL,
`exterior` text NOT NULL,
`steam_price` double NOT NULL,
`deposited` int(11) NOT NULL,
`deposit_start` int(11) NOT NULL
) ENGINE=InnoDB AUTO_INCREMENT=5219079 DEFAULT CHARSET=utf8;
ALTER TABLE `user_items_new`
ADD PRIMARY KEY (`user_item_id`), ADD KEY `user_id` (`user_id`), ADD KEY `bot_id` (`bot_id`);
ALTER TABLE `user_items_new`
MODIFY `user_item_id` int(11) NOT NULL AUTO_INCREMENT,AUTO_INCREMENT=5219079;
And then the bot table:
CREATE TABLE IF NOT EXISTS `bots_new` (
`bot_id` int(11) NOT NULL,
`name` varchar(64) NOT NULL,
`username` varchar(64) NOT NULL,
`password` varchar(64) NOT NULL,
`deposit_enabled` int(11) NOT NULL,
`withdraw_enabled` int(11) NOT NULL,
`ident` varchar(32) NOT NULL
) ENGINE=MyISAM AUTO_INCREMENT=19 DEFAULT CHARSET=utf8;
ALTER TABLE `bots_new`
ADD PRIMARY KEY (`bot_id`);
Edit (adding prettyprinted SELECT)
SELECT user_item_id, user_items_new.bot_id AS item_bot_id, sticker,
key, case, exterior, stattrak, image, user_items_new.name AS item_name,
steam_price, color, bots_new.bot_id, bots_new.name AS bot_name,
withdraw_enabled
FROM user_items_new
LEFT JOIN bots_new ON user_items_new.bot_id=bots_new.bot_id
WHERE user_items_new.bot_id != '0' && deposit_start='0' && deposited='0' && user_id='0' && withdraw_enabled='1'
ORDER BY user_items_new.name ASC
LIMIT , 100

How to speed this up...
Firstly, add a composite index on the columns that have predicates with equality comparisons first, e.g.
... ON user_items_new (user_id,deposited,deposit_start)
This will be of benefit if the predicates are filtering out a large number of rows. For example, if less than 10% of the rows satisfy the condition user_id = 0.
As an aside, the predicate withdraw_enabled='1' will negate the "outerness" of the LEFT JOIN. The result from the query will be equivalent if the keyword LEFT is omitted.
Another issue is that the ORDER BY will cause a "Using filesort" operation to sort the rows. The entire set will need to be sorted, before the LIMIT clause is applied. So we don't expect LIMIT 10 to be any faster than LIMIT 1000, apart from the additional time for the client to transfer an additional 990 rows. (The bit about sorting the entire set isn't entirely true; in some cases MySQL can abort the sort operation after identifying the first "limit" number of rows. But MySQL will still need to go through the entire set to get those first rows.)
It's possible that adding the column(s) in the ORDER BY clause to the index, following the columns with equality predicates. These would need to appear immediately following the columns referenced in the equality predicates. It may also be necessary to specify those same columns in the ORDER BY clause.
Assuming the current query includes:
...
WHERE ...
&& deposit_start='0' && u.deposited='0' && u.user_id='0' ...
...
ORDER BY steam_price ASC, user_items_new.name ASC
This index may be appropriate:
... ON user_items_new (user_id,deposited,deposit_start,steam_price,name)
The output from EXPLAIN will show whether that index is used for the query or not. Beyond the equality comparisons of the first three columns, MySQL can use a range scan operation on the index to satisfy the steam_price > predicate.
There's also the issue of the InnoDB buffer pool; how much memory is allocated to holding index and data pages in memory, to avoid storage i/o.
To avoid lookups to data pages in the underlying table, you can consider creating a covering index for the query. A covering index includes all of the columns referenced from the table, so the query can be satisfied entirely from the index. The EXPLAIN output will show "Using index" in the Extra column if the query is using a covering index. (But there are limits to the number of columns and the total row size in the index. This would most benefit the performance of the query when the table rows are large, and the size of the columns in the index is a small subset of the total table row.

With a table of that size, one of the simplest tricks you can use for optimizing the query is to add indexes on the fields you use in the where clause. This allows the parser to have stuff presorted for the queries you use most often.
For example, you should see significant gains by doing:
ALTER TABLE user_items_new ADD INDEX (steam_price);
The data and data type go a long way in determining the actual gains made. Adding indexes on all fields will result in going backwards on the efficiency of the query. So more is not necessarily better.

Your query is slow because your query against the user_items_new table requires inspecting 1.2 million rows. While you have indexes for user_item_id, user_id, and bot_id, those can only filter your results so far.
You will want to add indexes on some of your data columns. Which indexes you will want to add (and whether any of them are compound or not) is going to depend on the actual contents of the table and would be difficult to recommend without more information.
You will want to add indexes based on which columns where distinct values reduce the data that must be looked at significantly; an index on withdraw_enabled, for example, is not likely to gain much unless very few rows have withdraw_enabled == 1. An index on steam_price will be beneficial if very few of your rows have a steam_price >= 0.1.

Related

How do I optimize MYSQL select statements which execute 4000 times in total in a table with 50M rows

This is how my code looks like:
foreach ($instruments as $instrument) {
$stmt = $pdo->prepare("SELECT date, adjusted_close, close FROM ehd_historical_data WHERE exchange = ? AND symbol = ? AND date >= ? ORDER BY date asc LIMIT 1");
$stmt->execute([xyzToExchange($instrument2), xyzToSymbol($instrument2), $startDate]);
$data1 = $stmt->fetch(PDO::FETCH_ASSOC);
$stmt = $pdo->prepare("SELECT date, adjusted_close, close FROM ehd_historical_data WHERE exchange = ? AND symbol = ? ORDER BY date desc LIMIT 1");
$stmt->execute([xyzToExchange($instrument2), xyzToSymbol($instrument2)]);
$data2 = $stmt->fetch(PDO::FETCH_ASSOC);
}
There are around 2000 instruments that are string in this format "NASDAQ:AAPL".
It currently takes 7 seconds to complete since the database has around 50 million rows.
So far:
I have set INDEX for exchange, symbol and date together.
Set another INDEX for exchange and symbol together.
I want to ask further what can I do to optimize this query.
Note:
The function which this code is part of tries to find the price difference and the percent change between the start date and today's date. The start date can be anything like 6 months ago, 3 months ago.
I tried merging them in one large query and then executing them. Still same problem.
Update:
EXPLAIN for both queries
Table Schema
CREATE TABLE `ehd_historical_data` (
`exchange` varchar(255) NOT NULL,
`symbol` varchar(255) NOT NULL,
`date` date NOT NULL,
`open` decimal(20,10) NOT NULL,
`high` decimal(20,10) NOT NULL,
`low` decimal(20,10) NOT NULL,
`close` decimal(20,10) NOT NULL,
`adjusted_close` decimal(20,10) NOT NULL,
`volume` decimal(20,0) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
ALTER TABLE `ehd_historical_data`
ADD UNIQUE KEY `exchange_2` (`exchange`,`symbol`,`date`),
ADD KEY `exchange` (`exchange`),
ADD KEY `date` (`date`),
ADD KEY `symbol` (`symbol`),
ADD KEY `exchange_3` (`exchange`,`symbol`);
COMMIT;
Try selecting both rows in a single query using row_number()
select *
from (
SELECT date, adjusted_close, close,
row_number() over(order by date desc) rn1,
row_number() over(order by date asc) rn2
FROM ehd_historical_data
WHERE exchange = ? AND symbol = ? AND date >= ?
) t
where rn1 = 1 or rn2 = 1
You may also request all symbols at once. Note a partition clause
select *
from (
SELECT exchange, symbol, date, adjusted_close, close,
row_number() over(partition by exchange, symbol order by date desc) rn1,
row_number() over(partition by exchange, symbol order by date asc) rn2
FROM ehd_historical_data
WHERE ((exchange = 'NASDAQ' AND symbol = 'AAPL') OR (exchange = 'NASDAQ' AND symbol = 'MSFT') OR (exchange = 'NASDAQ' AND symbol = 'TSLA')) AND date >= ?
) t
where rn1 = 1 or rn2 = 1
Your index (exchange,symbol,date) is optimal for both of those SELECTs, so let's dig into other causes for sluggishness.
CREATE TABLE `ehd_historical_data` (
`exchange` varchar(255) NOT NULL, -- Don't use 255 if you don't need it
`symbol` varchar(255) NOT NULL, -- ditto
`date` date NOT NULL,
`open` decimal(20,10) NOT NULL, -- overkill
`high` decimal(20,10) NOT NULL,
`low` decimal(20,10) NOT NULL,
`close` decimal(20,10) NOT NULL,
`adjusted_close` decimal(20,10) NOT NULL,
`volume` decimal(20,0) NOT NULL
) ENGINE=InnoDB DEFAULT
CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci; -- probably all are ascii
ALTER TABLE `ehd_historical_data`
ADD UNIQUE KEY `exchange_2` (`exchange`,`symbol`,`date`), -- Change to PRIMARY KEY
ADD KEY `exchange` (`exchange`), -- redundant, DROP
ADD KEY `date` (`date`),
ADD KEY `symbol` (`symbol`),
ADD KEY `exchange_3` (`exchange`,`symbol`); -- redundant, DROP
COMMIT;
Have another table of symbols; use a MEDIUMINT UNSIGNED in this table.
decimal(20,10) takes 10 bytes; I know of no symbol that needs that much precision or range.
The above comments are aimed at making the table smaller. If the table is currently bigger than will fit in cache, I/O will be the cause of sluggishness.
How much RAM do you have? What is the value of `innodb_buffer_pool_size?
How fast does this run? (I'm thinking there might be a way to get all 2000 results in a single SQL. This might be a component of it.)
SELECT exchange, symbol,
MIN(date) AS date1,
MAX(date) AS date2
FROM ehd_historical_data
WHERE date > ?
GROUP BY exchange, symbol
That would be JOINed back to the table twice, once for each date.

Prepared Statement does not use expected index

I have a very large table of IOT sample that I'm trying to run a relativly simple query against. Running the query normally using the MySql CLI returns a result in ~0.07 seconds. If I first prepare the query either via PDO or by running a SQL PREPARE statement then the request takes over a minute.
I've enabled the the optimizer trace feature, and it looks like when the statement is prepared, MySql ignores the index that it should use and does a file sort of the whole table. I'd like any insight if I am doing something wrong or if this looks like a MySql bug.
The table itself contains over 100 million samples, and at least 300 thousand are associated with the device being queried here. I ran these tests with MySql 8.0.23, but when I upgraded to 8.0.25 the issues persisted.
Table definition (some data rows ommited)
Create Table: CREATE TABLE `samples` (
`id` bigint unsigned NOT NULL AUTO_INCREMENT,
`organization_id` int unsigned NOT NULL,
`device_id` int unsigned NOT NULL,
`created_at` timestamp NULL DEFAULT NULL,
`raw_reading` int DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `samples_organization_id_foreign` (`organization_id`),
KEY `samples_reverse_device_id_created_at_organization_id_index` (`device_id`,`created_at` DESC,`organization_id`),
CONSTRAINT `samples_device_id_foreign` FOREIGN KEY (`device_id`) REFERENCES `devices` (`id`) ON DELETE RESTRICT ON UPDATE CASCADE,
CONSTRAINT `samples_organization_id_foreign` FOREIGN KEY (`organization_id`) REFERENCES `organizations` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=188315314 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
Sql That runs in < 1s
select *
from `samples`
where `samples`.`device_id` = 5852
and `samples`.`device_id` is not null
and `id` != 188315308
order by `created_at` desc
limit 1;
Sql That runs in over a minute
prepare test_prep from 'select * from `samples` where `samples`.`device_id` = ? and `samples`.`device_id` is not null and `id` != ? order by `created_at` desc limit 1';
set #a = 5852;
set #b = 188315308;
execute test_prep using #a, #b;
Trace for the non prepared SQL can be found at my gist, but the relevant part is
{
"reconsidering_access_paths_for_index_ordering": {
"clause": "ORDER BY",
"steps": [
],
"index_order_summary": {
"table": "`samples`",
"index_provides_order": true,
"order_direction": "asc",
"index": "samples_reverse_device_id_created_at_organization_id_index",
"plan_changed": false
}
}
},
Trace for the prepared query can be found at my other gist, but the relevant part is
{
"reconsidering_access_paths_for_index_ordering": {
"clause": "ORDER BY",
"steps": [
],
"index_order_summary": {
"table": "`samples`",
"index_provides_order": false,
"order_direction": "undefined",
"index": "samples_reverse_device_id_created_at_organization_id_index",
"plan_changed": false
}
}
},
The index you want to use is not that bad:
`samples_reverse_device_id_created_at_organization_id_index`
(`device_id`,`created_at` DESC,`organization_id`)
However, is not a covering index. If the query performance is really important, I would add an index that covers the filtering predicate at least. Your don't need a real covering index since you are retrieving all columns. I would try:
create index ix1 on samples (device_id, created_at, id);
EDIT
Another trick that could promote the index usage is to delay the predicate id != 188315308 as much as possible. If you know that this predicate will be matched by at least one row in the first 100 rows produced by the rest of the predicates you can try rephrasing your query as:
select *
from (
select *
from `samples`
where `samples`.`device_id` = 5852
order by `created_at` desc
limit 100
) x
where `id` != 188315308
order by `created_at` desc
limit 1
Get rid of this, since the = 5852 assures that it will be false:
and `samples`.`device_id` is not null
Then your index, or this one, should work fine.
INDEX(device_id, created_at, id)
Do not use #variables; the Optimizer seems to not look at the value they contain. That is, instead of
set #a = 5852;
set #b = 188315308;
execute test_prep using #a, #b;
Simply do
execute test_prep using 5852, 188315308;
Consider writing a bug report at bugs.mysql.com
I suspect "order_direction": "undefined" is part of the problem.
Not full solution, but a workaround. I added an index on just my timestamp and that seems to satisfy the optimizer.
KEY `samples_created_at_index` (`created_at` DESC),
I'm going to try to clean up a minimal test case and post it over on MySql bugs. I'll add a followup here if anything comes of that.

MySQL "NOT IN" Query Optimization

Optimizng MySQL queries isn't my expertise, so I was wondering if someone could help me formulate the most optimal query here (and indices).
As background, I'm trying to find a distinct visitor id within a table of transactions with certain where criteria (date range, not a certain product, etc. as you see in the query below). Transactions and visitors have a one to many relationship, so there can be many transactions to a single visitor.
Another requirement for the results is that if a visitor_id is found in the result, it must be the first instance of a visitor_id (by date_time) in the entire table. In other words, the visitor_id should only exist in the date range set in the primary query and at no time beforehand.
Here's what I've put together so far. It uses NOT IN and a subquery, but this doesn't seem ideal because the query takes between 2-3 seconds being that the table has over 500k records. I've tried a few variations of indices, but nothing seems to really work.
Here's the query.
SELECT DISTINCT visitor_id, date_time
FROM pt_transactions
WHERE visitor_id NOT IN (SELECT visitor_id FROM pt_transactions WHERE date_time < '$this->_date_time_start')
AND campaign_id = $this->_campaign_id
AND a_aid = '$a_aid'
AND date_time >= '$this->_date_time_start'
AND date_time <= '$this->_date_time_end'
AND product_id != 65
And here's the complete table structure.
CREATE TABLE IF NOT EXISTS `pt_transactions` (
`id` int(32) NOT NULL AUTO_INCREMENT,
`type` varchar(2) NOT NULL COMMENT 'New Lead (NL), Raw Optin (RO), Base Sale (BS), Upsell Sale (US), Recurring Sale (RS), Base Refund (BR), Upsell Refund (UR), Recurring Refund (RR), Unknown Refund (XR), or Chargeback (C)',
`date_time` datetime NOT NULL,
`amount` varchar(255) NOT NULL,
`a_aid` varchar(255) NOT NULL,
`subid1` varchar(255) NOT NULL,
`subid2` varchar(255) NOT NULL,
`subid3` varchar(255) NOT NULL,
`product_id` int(16) NOT NULL,
`visitor_id` int(32) NOT NULL,
`campaign_id` int(16) NOT NULL,
`last_click_id` int(16) NOT NULL,
`trackback_type` varchar(255) NOT NULL COMMENT 'Shows if the transaction is tracked back to the original visitor via cookie or via IP. Usually only applies to sales via pixel.',
`original_transaction_id` int(32) NOT NULL COMMENT 'Reference to original transaction id, in this table, if type is RS, R, or C',
`recurring_transaction_id` varchar(32) NOT NULL COMMENT 'Reference to existing RecurringTransaction if type is RS',
PRIMARY KEY (`id`),
KEY `visitor_id` (`visitor_id`),
KEY `campaign_id` (`visitor_id`,`campaign_id`,`amount`,`product_id`),
KEY `transaction_retrieval_group` (`campaign_id`,`date_time`,`a_aid`),
KEY `type` (`type`),
KEY `date_time` (`date_time`),
KEY `original_source` (`campaign_id`,`a_aid`,`date_time`,`product_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=574636
You can try NOT EXISTS
SELECT DISTINCT visitor_id, date_time
FROM pt_transactions t
WHERE campaign_id = $this->_campaign_id
AND a_aid = '$a_aid'
AND date_time >= '$this->_date_time_start'
AND date_time <= '$this->_date_time_end'
AND product_id != 65
AND NOT EXISTS
(
SELECT *
FROM pt_transactions
WHERE visitor_id = t.visitor_id
AND date_time < '$this->_date_time_start'
)
Do EXPLAIN <query> and see how your indices are used. If you want you can post results in your question in a textual form.
From your query what i can understand is that...
Their is no need to write NOT IN Statement...
Because, you are already keeping a check for
date_time >= '$this->_date_time_start'
so thier is no need to check date_time < '$this->_date_time_start' in not NOT IN statement.
Only below should work fine :)
SELECT DISTINCT visitor_id, date_time
FROM pt_transactions
WHERE
AND campaign_id = $this->_campaign_id
AND a_aid = '$a_aid'
AND date_time >= '$this->_date_time_start'
AND date_time <= '$this->_date_time_end'
AND product_id != 65

Best way to count millions of rows between certain dates in MySQL

Here is my SQL for creating a table:
$sql_create_table = "CREATE TABLE {$table_name} (
hit_id bigint(20) unsigned NOT NULL auto_increment,
user_id int(7) unsigned NOT NULL default '0',
optin_id int(8) unsigned NOT NULL default '0',
hit_date datetime NOT NULL default '0000-00-00 00:00:00',
hit_type varchar(10) NOT NULL default '',
PRIMARY KEY (hit_id),
KEY user_id (user_id)
) $charset_collate; ";
I need to know the fastest way to count the number of rows within a query. My current query doesn't cut it for going through millions of rows.
$sql = "SELECT hit_id FROM $table_name WHERE user_id = %d AND hit_type = %s AND hit_date >= FROM_UNIXTIME(%d) AND hit_date <= FROM_UNIXTIME(%d)";
I've tried this with no luck (not returning the proper results):
$sql = "SELECT COUNT(*) FROM $table_name WHERE user_id = %d AND hit_type = %s AND hit_date >= FROM_UNIXTIME(%d) AND hit_date <= FROM_UNIXTIME(%d)";
What do I need to do to make this query efficient so that it doesnt time out for millions of rows? I simply want to count the number of rows within the specified parameter set.
I'm not sure of the performance of the FROM_UNIXTIME function, but the first thing I would do is create an index on hit_date.
http://dev.mysql.com/doc/refman/5.0/en/create-index.html

Server uptime script

I'm trying to do something for my website, to be specific, I'm trying to do a script for uptime.
I have the reader, a script which read the percents from a table.
This is not a very efficient way to use a Relational Database. I would, instead, suggest (at least with the SQL side), the following:
CREATE TABLE `servers` (
`srv_id` INTEGER UNSIGNED NOT NULL AUTO_INCREMENT,
-- Additional Fields Omitted here.
PRIMARY KEY (`srv_id`)
)
ENGINE = InnoDB;
CREATE TABLE `stats` (
`stat_id` INTEGER UNSIGNED NOT NULL AUTO_INCREMENT,
`srv_id` INTEGER UNSIGNED NOT NULL,
`date` TIMESTAMP NOT NULL,
`uptime` INTEGER UNSIGNED NOT NULL,
PRIMARY KEY (`stat_id`)
)
ENGINE = InnoDB;
This way you can record as many measures as you like, against as many servers as you like, and then use SQL to either delete old content or keep the old content and use WHERE arguments to filter the data used in the interface displaying these stats.
$day = int(strftime("%j") % 5);
$key = 'day' . $day;
if($row[$key] == 0)
{
if($checkls && $checkgs) //if server is online update the percent
mysql_query("UPDATE s_stats SET ${key}=".($stats_row[$key] + 0.5)." WHERE srv_id=".$r[id]." ") or die(mysql_error()); //every 7.2 minutes add 0.5 percent
else echo "error day $day";
}

Categories