Optimizng MySQL queries isn't my expertise, so I was wondering if someone could help me formulate the most optimal query here (and indices).
As background, I'm trying to find a distinct visitor id within a table of transactions with certain where criteria (date range, not a certain product, etc. as you see in the query below). Transactions and visitors have a one to many relationship, so there can be many transactions to a single visitor.
Another requirement for the results is that if a visitor_id is found in the result, it must be the first instance of a visitor_id (by date_time) in the entire table. In other words, the visitor_id should only exist in the date range set in the primary query and at no time beforehand.
Here's what I've put together so far. It uses NOT IN and a subquery, but this doesn't seem ideal because the query takes between 2-3 seconds being that the table has over 500k records. I've tried a few variations of indices, but nothing seems to really work.
Here's the query.
SELECT DISTINCT visitor_id, date_time
FROM pt_transactions
WHERE visitor_id NOT IN (SELECT visitor_id FROM pt_transactions WHERE date_time < '$this->_date_time_start')
AND campaign_id = $this->_campaign_id
AND a_aid = '$a_aid'
AND date_time >= '$this->_date_time_start'
AND date_time <= '$this->_date_time_end'
AND product_id != 65
And here's the complete table structure.
CREATE TABLE IF NOT EXISTS `pt_transactions` (
`id` int(32) NOT NULL AUTO_INCREMENT,
`type` varchar(2) NOT NULL COMMENT 'New Lead (NL), Raw Optin (RO), Base Sale (BS), Upsell Sale (US), Recurring Sale (RS), Base Refund (BR), Upsell Refund (UR), Recurring Refund (RR), Unknown Refund (XR), or Chargeback (C)',
`date_time` datetime NOT NULL,
`amount` varchar(255) NOT NULL,
`a_aid` varchar(255) NOT NULL,
`subid1` varchar(255) NOT NULL,
`subid2` varchar(255) NOT NULL,
`subid3` varchar(255) NOT NULL,
`product_id` int(16) NOT NULL,
`visitor_id` int(32) NOT NULL,
`campaign_id` int(16) NOT NULL,
`last_click_id` int(16) NOT NULL,
`trackback_type` varchar(255) NOT NULL COMMENT 'Shows if the transaction is tracked back to the original visitor via cookie or via IP. Usually only applies to sales via pixel.',
`original_transaction_id` int(32) NOT NULL COMMENT 'Reference to original transaction id, in this table, if type is RS, R, or C',
`recurring_transaction_id` varchar(32) NOT NULL COMMENT 'Reference to existing RecurringTransaction if type is RS',
PRIMARY KEY (`id`),
KEY `visitor_id` (`visitor_id`),
KEY `campaign_id` (`visitor_id`,`campaign_id`,`amount`,`product_id`),
KEY `transaction_retrieval_group` (`campaign_id`,`date_time`,`a_aid`),
KEY `type` (`type`),
KEY `date_time` (`date_time`),
KEY `original_source` (`campaign_id`,`a_aid`,`date_time`,`product_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=574636
You can try NOT EXISTS
SELECT DISTINCT visitor_id, date_time
FROM pt_transactions t
WHERE campaign_id = $this->_campaign_id
AND a_aid = '$a_aid'
AND date_time >= '$this->_date_time_start'
AND date_time <= '$this->_date_time_end'
AND product_id != 65
AND NOT EXISTS
(
SELECT *
FROM pt_transactions
WHERE visitor_id = t.visitor_id
AND date_time < '$this->_date_time_start'
)
Do EXPLAIN <query> and see how your indices are used. If you want you can post results in your question in a textual form.
From your query what i can understand is that...
Their is no need to write NOT IN Statement...
Because, you are already keeping a check for
date_time >= '$this->_date_time_start'
so thier is no need to check date_time < '$this->_date_time_start' in not NOT IN statement.
Only below should work fine :)
SELECT DISTINCT visitor_id, date_time
FROM pt_transactions
WHERE
AND campaign_id = $this->_campaign_id
AND a_aid = '$a_aid'
AND date_time >= '$this->_date_time_start'
AND date_time <= '$this->_date_time_end'
AND product_id != 65
Related
This is how my code looks like:
foreach ($instruments as $instrument) {
$stmt = $pdo->prepare("SELECT date, adjusted_close, close FROM ehd_historical_data WHERE exchange = ? AND symbol = ? AND date >= ? ORDER BY date asc LIMIT 1");
$stmt->execute([xyzToExchange($instrument2), xyzToSymbol($instrument2), $startDate]);
$data1 = $stmt->fetch(PDO::FETCH_ASSOC);
$stmt = $pdo->prepare("SELECT date, adjusted_close, close FROM ehd_historical_data WHERE exchange = ? AND symbol = ? ORDER BY date desc LIMIT 1");
$stmt->execute([xyzToExchange($instrument2), xyzToSymbol($instrument2)]);
$data2 = $stmt->fetch(PDO::FETCH_ASSOC);
}
There are around 2000 instruments that are string in this format "NASDAQ:AAPL".
It currently takes 7 seconds to complete since the database has around 50 million rows.
So far:
I have set INDEX for exchange, symbol and date together.
Set another INDEX for exchange and symbol together.
I want to ask further what can I do to optimize this query.
Note:
The function which this code is part of tries to find the price difference and the percent change between the start date and today's date. The start date can be anything like 6 months ago, 3 months ago.
I tried merging them in one large query and then executing them. Still same problem.
Update:
EXPLAIN for both queries
Table Schema
CREATE TABLE `ehd_historical_data` (
`exchange` varchar(255) NOT NULL,
`symbol` varchar(255) NOT NULL,
`date` date NOT NULL,
`open` decimal(20,10) NOT NULL,
`high` decimal(20,10) NOT NULL,
`low` decimal(20,10) NOT NULL,
`close` decimal(20,10) NOT NULL,
`adjusted_close` decimal(20,10) NOT NULL,
`volume` decimal(20,0) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
ALTER TABLE `ehd_historical_data`
ADD UNIQUE KEY `exchange_2` (`exchange`,`symbol`,`date`),
ADD KEY `exchange` (`exchange`),
ADD KEY `date` (`date`),
ADD KEY `symbol` (`symbol`),
ADD KEY `exchange_3` (`exchange`,`symbol`);
COMMIT;
Try selecting both rows in a single query using row_number()
select *
from (
SELECT date, adjusted_close, close,
row_number() over(order by date desc) rn1,
row_number() over(order by date asc) rn2
FROM ehd_historical_data
WHERE exchange = ? AND symbol = ? AND date >= ?
) t
where rn1 = 1 or rn2 = 1
You may also request all symbols at once. Note a partition clause
select *
from (
SELECT exchange, symbol, date, adjusted_close, close,
row_number() over(partition by exchange, symbol order by date desc) rn1,
row_number() over(partition by exchange, symbol order by date asc) rn2
FROM ehd_historical_data
WHERE ((exchange = 'NASDAQ' AND symbol = 'AAPL') OR (exchange = 'NASDAQ' AND symbol = 'MSFT') OR (exchange = 'NASDAQ' AND symbol = 'TSLA')) AND date >= ?
) t
where rn1 = 1 or rn2 = 1
Your index (exchange,symbol,date) is optimal for both of those SELECTs, so let's dig into other causes for sluggishness.
CREATE TABLE `ehd_historical_data` (
`exchange` varchar(255) NOT NULL, -- Don't use 255 if you don't need it
`symbol` varchar(255) NOT NULL, -- ditto
`date` date NOT NULL,
`open` decimal(20,10) NOT NULL, -- overkill
`high` decimal(20,10) NOT NULL,
`low` decimal(20,10) NOT NULL,
`close` decimal(20,10) NOT NULL,
`adjusted_close` decimal(20,10) NOT NULL,
`volume` decimal(20,0) NOT NULL
) ENGINE=InnoDB DEFAULT
CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci; -- probably all are ascii
ALTER TABLE `ehd_historical_data`
ADD UNIQUE KEY `exchange_2` (`exchange`,`symbol`,`date`), -- Change to PRIMARY KEY
ADD KEY `exchange` (`exchange`), -- redundant, DROP
ADD KEY `date` (`date`),
ADD KEY `symbol` (`symbol`),
ADD KEY `exchange_3` (`exchange`,`symbol`); -- redundant, DROP
COMMIT;
Have another table of symbols; use a MEDIUMINT UNSIGNED in this table.
decimal(20,10) takes 10 bytes; I know of no symbol that needs that much precision or range.
The above comments are aimed at making the table smaller. If the table is currently bigger than will fit in cache, I/O will be the cause of sluggishness.
How much RAM do you have? What is the value of `innodb_buffer_pool_size?
How fast does this run? (I'm thinking there might be a way to get all 2000 results in a single SQL. This might be a component of it.)
SELECT exchange, symbol,
MIN(date) AS date1,
MAX(date) AS date2
FROM ehd_historical_data
WHERE date > ?
GROUP BY exchange, symbol
That would be JOINed back to the table twice, once for each date.
Before I dive in, This example below is what I hope to achieve.
Meanwhile I Have a database table that is structured like so.
CREATE TABLE `notifications` (
`id` int(11) NOT NULL,
`recipient_id` int(11) NOT NULL,
`sender_id` int(11) NOT NULL,
`unread` tinyint(4) NOT NULL DEFAULT '1',
`type` varchar(255) NOT NULL,
`reference_id` int(11) NOT NULL,
`created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
`recipient_id` - Notification Reciever
`sender_id` - Notification Sender
`unread` - Mark if notificaton has been read or not
`type` - Holds types of notification, Comment, Likes etc
`reference_id` - the reference like a post id
`created_at` - Time noti was created.
and a PHP to fetch data from the database table.
$query = $this->database->query("SELECT recipient_id, unread, type, reference_id, post_title, post_name, COUNT(reference_id)
FROM notifications n INNER JOIN posts p ON p.id = n.reference_id WHERE n.recipient_id = $user AND n.type = 'post_comment' AND n.unread = 1
GROUP BY n.reference_id HAVING COUNT(n.reference_id) >= 1 ORDER BY unread DESC LIMIT 8")->fetchAll();
return $query;
I'm grouping the results I get by the reference_id if its >= 1
this reason is so I don't get duplicate notification that has the same reference_id.
with this query so far I am able to get the data from the database table and display like so.
Someone commented on your post "I love to code"
but I want to display to the user like the example above or like this below.
James, John and others commented on your post "I Love to Code"
thats if there is more than 1 or 2 sender_id with the same reference_id
this is where I am stuck and don't know which other step to take, please any help I can get is appreciated.
thanks
So I know that there are several many posts on this topic on this website, and the closest one I could find that was similar was:
Can I take the results from two rows and combine them into one?
I am working on a project that involves 'accounts receivables' and 'accounts payable', but that both of those need data in a single list:
date | description | reference | debit | credit
I have read about the mySQL UNION statement being used to combine two result sets into one, however, it also appears that the two results sets must match in column count and type according to the below website:
http://www.w3schools.com/sql/sql_union.asp
The problem I'm facing is that the two result sets don't have the same column count as the information for one doesn't directly correlate to the other (which will exclude the use of the UNION statement). What would be the best practice at acquiring the data from the two tables and sort them based on date? I'll include my SQL calls below as reference:
Accounts Receivable:
SELECT tblARP.*,tblAR.invoiceID,tblAR.ledgerID
FROM Accounting_ReceivablesPayments tblARP
INNER JOIN Accounting_Receivables tblAR ON tblARP.invoiceID = tblAR.invoiceID
ORDER BY deposited
Accounts Payable:
SELECT tblAPP.*,tblAP.id,tblAP.ledgerID,tblAP.tblName,tblAP.rowID,tblAP.invoice
FROM Accounting_PayablesPayments tblAPP
INNER JOIN Accounting_Payables tblAP ON tblAPP.payablesID = tblAP.id
ORDER BY deposited
UPDATE
Per the requests in the comments, here are the columns for the tables:
Accounting_Receivables
id BIGINT PRIMARY KEY NOT NULL AUTO_INCREMENT UNIQUE,
invoiceID BIGINT NOT NULL,
amount DECIMAL(9,2) NOT NULL DEFAULT '1.00',
ledgerID BIGINT NOT NULL,
note TEXT
Accounting_ReceivablesPayments
id BIGINT PRIMARY KEY NOT NULL AUTO_INCREMENT UNIQUE,
invoiceID BIGINT NOT NULL,
received DATE NOT NULL,
type VARCHAR(10) NOT NULL,
amount DECIMAL(9,2) NOT NULL DEFAULT '1.00',
deposited DATE,
tag VARCHAR(32) NOT NULL
Accounting_Payables
id BIGINT PRIMARY KEY NOT NULL AUTO_INCREMENT UNIQUE,
paid TINYINT(1) UNSIGNED NOT NULL DEFAULT '0',
invoice BIGINT NOT NULL,
amount DECIMAL(9,2) NOT NULL DEFAULT '1.00',
terms VARCHAR(3) NOT NULL DEFAULT 'net',
due DATE,
tblName VARCHAR(48) NOT NULL,
rowID BIGINT NOT NULL,
ledgerID BIGINT NOT NULL,
note TEXT
Accounting_PayablesPayments
id BIGINT PRIMARY KEY NOT NULL AUTO_INCREMENT UNIQUE,
payablesID BIGINT NOT NULL,
created DATE NOT NULL,
type VARCHAR(10) NOT NULL,
amount DECIMAL(9,2) NOT NULL DEFAULT '1.00',
deposited DATE,
tag VARCHAR(32) NOT NULL
to what I was saying in the comments you should do this
( SELECT
tblARP.*,
tblAR.invoiceID,
tblAR.ledgerID,
NULL, -- # -- null values for your rows to match columns
NULL,
NULL
FROM `Accounting_ReceivablesPayments` tblARP
INNER JOIN `Accounting_Receivables` tblAR ON tblARP.invoiceID = tblAR.invoiceID
ORDER BY deposited
)
UNION ALL -- # -- union all to include everything
( SELECT
tblAPP.*,
tblAP.id,
tblAP.ledgerID,
tblAP.tblName,
tblAP.rowID,
tblAP.invoice
FROM `Accounting_PayablesPayments` tblAPP
INNER JOIN `Accounting_Payables` tblAP ON tblAPP.payablesID = tblAP.id
ORDER BY deposited
)
I have db table structure like that :
--
-- Table structure for table `table_submissions`
--
CREATE TABLE IF NOT EXISTS `table_submissions` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT COMMENT 'Submission ID',
`item_name` bigint(20) unsigned NOT NULL COMMENT 'Item Name',
`status` tinyint(1) NOT NULL DEFAULT '0' COMMENT '0 = pending, 1 = approved, -1 = denied',
PRIMARY KEY (`id`),
KEY `item_name` (`item_name`),
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=2 ;
php
$date_info = $db->fetchOne("SELECT * FROM table_submissions");
I need to code php for "Most Required Item" order by count item_name limit 10
This mean the most item_name which the visitor has been required it will be in the top then ++ 9
How i can do that please ?
I believe you want this:
SELECT
item_name,
count(1)
FROM
table_submissions
GROUP BY
item_name
ORDER BY
count(1) DESC
LIMIT 10;
Note this does not take into account one user requesting the same item 1000 times verses 1000 users requesting an item once each. If that's important to you then you need a way to filter by user ID.
I have a seemingly simple task but I cannot seem to find an elegant solution using 1 query...
Problem:
I have a table of recorded 'clicks' on 'posts', where each post is part of a 'category'.
I want to find the 16 highest clicked posts in the last 30 days -- but I want to avoid duplicate categories.
It seems very simple actually, but I seem to be stuck.
I know how to get the most clicked in last 30, but I can't figure out how to avoid duplicate cats.
SELECT cat_id,
post_id,
COUNT(post_id) AS click_counter
FROM cs_coupon_clicks
WHERE time_of_click > DATE_SUB(NOW(), INTERVAL 30 DAY)
GROUP BY post_id
ORDER BY click_counter DESC
I tried to get creative/hacky with it... it's close but not correct:
SELECT cat_id,
Max(sort) AS sortid
FROM (SELECT cat_id,
post_id,
COUNT(post_id) AS click_counter,
CONCAT(COUNT(post_id), '-', post_id) AS sort
FROM cs_coupon_clicks
WHERE time_of_click > DATE_SUB(NOW(), INTERVAL 30 DAY)
GROUP BY cat_id, post_id) t1
GROUP BY cat_id
ORDER BY cat_id ASC
Any help would be greatly appreciated as I am not really a MySQL expert. I may end up just doing some PHP logic in the end, but I am very curious as to the correct way to approach a problem like this.
Thanks guys.
EDIT (structure):
CREATE TABLE `cs_coupon_clicks` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`src` varchar(255) NOT NULL DEFAULT '',
`cat_id` int(20) NOT NULL,
`post_id` int(20) NOT NULL,
`tag_id` int(20) NOT NULL,
`user_id` int(20) DEFAULT NULL,
`ip_address` char(30) DEFAULT NULL,
`referer` varchar(255) NOT NULL,
`browser` varchar(10) DEFAULT NULL,
`server_var` text NOT NULL,
`time_of_click` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `cat_id` (`cat_id`),
KEY `post_id` (`post_id`),
KEY `user_id` (`user_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
TEMP WORKING SOLUTION (HACKY):
SELECT
cat_id,
MAX(sort) AS sortid
FROM (
SELECT
cat_id,
post_id,
COUNT(post_id) AS click_counter,
RIGHT(Concat('00000000', COUNT(post_id), '-', post_id), 16) AS SORT
FROM cs_coupon_clicks
WHERE time_of_click > DATE_SUB(NOW(), INTERVAL 30 DAY)
GROUP BY cat_id, post_id
) AS t1
GROUP BY cat_id
ORDER BY sortid DESC
There is no easy single query solution to this problem, it's a group-wise maximum kind of problem based on a temporary table (the one with counts) that would require self-joins.
Assuming your database grows big enough (otherwise just go for your php logic) I would go for a statistics table, holding info about categories, posts and click counts:
CREATE TABLE `click_cnts` (
`cat_id` int(20) NOT NULL,
`post_id` int(20) NOT NULL,
`clicks` int(20) NOT NULL,
PRIMARY KEY (`cat_id`,`post_id`),
KEY `cat_id` (`cat_id`,`clicks`)
)
and fill it using the same query as the first one in the question:
INSERT INTO click_cnts(cat_id, post_id, clicks)
SELECT cat_id, post_id, COUNT(post_id) AS click_counter
FROM cs_coupon_clicks
WHERE time_of_click > NOW() - INTERVAL 30 DAY
GROUP BY cat_id,post_id
You could update this table using triggers or running update query periodically (do users really need info up to the very last second? probably not...) and save a lot of processing as finding most clicks for each category on indexed table requires a lot less time using a classic group-wise max approach:
SELECT cg.cat_id, cu.post_id, cg.most_clicks
FROM
( SELECT cat_id, max(clicks) as most_clicks FROM click_cnts
GROUP BY cat_id ) cg
JOIN click_cnts cu
ON cg.cat_id = cu.cat_id
AND cu.post_id = ( SELECT cc.post_id FROM click_cnts cc
WHERE cc.cat_id = cg.cat_id
AND cc.clicks = cg.most_clicks
LIMIT 1 )
ORDER BY cg.most_clicks DESC
LIMIT 16
Shot in the dark here. Did you try Select DISTINCT cat_id