I'm trying to optimize a slow query and I've come across the following (running consistently over 8 seconds).
SELECT entryID, entryID AS iE, 0 AS eE, 'clarus1' AS locationID, dateStamp, amount AS income, NULL AS expense, reconciled, leaseID AS vendorID, incomeID AS expenseID
FROM structu_income.iLedger
WHERE (dateStamp BETWEEN '2009-04-16' AND '2012-02-29') AND incomeID IS NOT NULL
AND (
leaseID IN (
SELECT lease.leaseID FROM structu_assets.lease WHERE lease.unitID IN (
SELECT unit.unitID FROM structu_assets.unit WHERE unit.locationID = 'clarus1'
)
)
OR locationID IN (SELECT locationID FROM structu_assets.deed WHERE ownerID = 'clarus')
)
Here's the EXPLAIN:
My thought was to refactor the subqueries to use JOIN. But keeping the logical OR is throwing me off.
In addition, the nested subqueries seem inevitable. Unless I predetermined the unitID in a separate query.
I'm not the original developer. But I'm charged with making it more performant without modifying the existing codebase or schema. So I'm attempting the pick off the slow queries.
As an aside, do cross database queries take a performance hit?
add an index to your dateStamp column.
use :
dateStamp > '2009-04-16' AND dateStamp < '2012-02-29'
instead of :
dateStamp BETWEEN '2009-04-16' AND '2012-02-29'
it's more efficient.
Finally, you can write a loop in PHP instead of subqueries.
It would be interesting to see if this answer a) works and b) is faster. Try this:
SELECT
il.entryID,
il.entryID AS iE,
0 AS eE,
'clarus1' AS locationID,
il.dateStamp,
il.amount AS income,
NULL AS expense,
il.reconciled,
il.leaseID AS vendorID,
il.incomeID AS expenseID
FROM
structu_income.iLedger il
INNER JOIN structu_assets.lease l ON il.leaseID = l.leaseID
INNER JOIN structu_assets.unit u ON l.unitID = u.unitID AND u.locationID = 'clarus1'
WHERE
il.dateStamp BETWEEN '2009-04-16' AND '2012-02-29'
AND il.incomeID IS NOT NULL
UNION
SELECT
il.entryID,
il.entryID AS iE,
0 AS eE,
'clarus1' AS locationID,
il.dateStamp,
il.amount AS income,
NULL AS expense,
il.reconciled,
il.leaseID AS vendorID,
il.incomeID AS expenseID
FROM
structu_income.iLedger il
INNER JOIN structu_assets.deed d ON il.locationID = d.locationID AND d.ownerID = 'clarus'
WHERE
il.dateStamp BETWEEN '2009-04-16' AND '2012-02-29'
AND il.incomeID IS NOT NULL
The first SELECT query takes care of the first half of your OR conditions, and the second SELECT query adds in the results for the second half. And you shouldn't get duplicate rows with UNION so I believe you should get the same results.
Related
I have table having list of football matches. it has columns like match_id, team a score, team 2 score, rounds, match_date etc.
I need all rows from database with highest margin of that round
margin is difference of team a score and team b score.
my query is
SELECT *,( SELECT MAX(ABS((z.home_score - z.away_score)))
FROM tblform_matches z
WHERE YEAR(z.match_date) = YEAR(tblform_matches.match_date)
AND z.round = tblform_matches.round ) as highest_margin
from tblform_matches where some condtion
it is a simplified query where some condition is a large query string to select some specified matches according to filter.
currently there are around 5000 matches in database.
Due to sub-query my page is taking 4 more seconds to load.
there are 9 matches in each round and there are more than 20 rounds in every year
I am executing the above query for every team in php loop. I cant change this thing. as there are a lot of calculation for showing stats.
Sorry if my question is uncleared, I am here if I missed something as I am a new bee to stakoverflow
Thanks in advance.
This is your query:
SELECT m.*,
(SELECT MAX(ABS((m2.home_score - m2.away_score)))
FROM tblform_matches m2
WHERE YEAR(m2.match_date) = YEAR(m.match_date) AND
m2.round = m.round
) as highest_margin
from tblform_matches m
where some condition;
Presumably, the best way to optimize this is to focus on . Oh well. You will want the right indexes there.
Indexes are clearly the solution, but you have a problem because the of the year function. And easy solution is to use indequalities:
SELECT m.*,
(SELECT MAX(ABS((m2.home_score - m2.away_score)))
FROM tblform_matches m2
WHERE m2.round = m.round
(m2.match_date >= makedate(year(m.match_date), 1) and
m2.match_date < makedate(year(m.match_date) + 1, 1)
)
) as highest_margin
from tblform_matches m
where some condtion;
The best index for the subquery is tblform_matches(round, match_date, home_score, away_score). The first two columns are used for the where clause. The second two for the select.
Note: if you made two relatively minor changes to the data structure, this could work even better. Add a column for the year of the match date (redundant, but important for indexing). And, add a column for the absolute value of the difference between the scores. Then the query would be:
SELECT m.*,
(SELECT MAX(score_diff)
FROM tblform_matches m2
WHERE m2.round = m.round and m2.matchyear = m.matchyear
) as highest_margin
from tblform_matches m
where some condtion;
The index on this query would be: tblform_matches(round, matchyear, score_diff) and the lookup should be pretty fast.
EDIT:
You may get better performance with an explicit join:
SELECT m.*, m2.highest_margin
from tblform_matches m join
(select MAX(ABS((m2.home_score - m2.away_score))) as highest_margin
from tblform_matches m2
group by year(m2.match_date), m2.round
) m2
on year(m.match_date) = year(m2.match_date) and m2.round = m.round
where some condition;
I'm working with the join plus union plus group by query, and I developed a query something like mentioned below:
SELECT *
FROM (
(SELECT countries_listing.id,
countries_listing.country,
1 AS is_country
FROM countries_listing
LEFT JOIN product_prices ON (product_prices.country_id = countries_listing.id)
WHERE countries_listing.status = 'Yes'
AND product_prices.product_id = '3521')
UNION
(SELECT countries_listing.id,
countries_listing.country,
0 AS is_country
FROM countries_listing
WHERE countries_listing.id NOT IN
(SELECT country_id
FROM product_prices
WHERE product_id='3521')
AND countries_listing.status='Yes')) AS partss
GROUP BY id
ORDER BY country
And I just realised that this query is taking a lot of time to load results, almost 8 seconds.
I was wondering if there is the possibility to optimize this query to the fastest one?
If I understand the logic correctly, you just want to add a flag for the country as to whether or not there is a price for a given product. I think you can use an exists clause to get what you want:
SELECT cl.id, cl.country,
(exists (SELECT 1
FROM product_prices pp
WHERE pp.country_id = cl.id AND
pp.product_id = '3521'
)
) as is_country
FROM countries_listing cl
WHERE cl.status = 'Yes'
ORDER BY country;
For performance, you want two indexes: countries_listing(status, country) and
product_prices(country_id, product_id)`.
Depending on how often it is executed, prepared statements could help. See PDO for more information.
i'm having a good time coding a little visitor counter. it's a PHP5/SQLite3 mix.
made two database tables, one for the visitors, and one for the hits. structure and sample data:
CREATE TABLE 'visitors' (
'id' INTEGER DEFAULT NULL PRIMARY KEY AUTOINCREMENT,
'ip' TEXT DEFAULT NULL,
'hash' TEXT DEFAULT NULL,
UNIQUE(ip)
);
INSERT INTO "visitors" ("id","ip","hash") VALUES ('1','1.2.3.4','f9702c362aa9f1b05002804e3a65280b');
INSERT INTO "visitors" ("id","ip","hash") VALUES ('2','1.2.3.5','43dc8b0a4773e45deab131957684867b');
INSERT INTO "visitors" ("id","ip","hash") VALUES ('3','1.2.3.6','9ae1c21fc74b2a3c1007edf679c3f144');
CREATE TABLE 'hits' (
'id' INTEGER DEFAULT NULL PRIMARY KEY AUTOINCREMENT,
'time' INTEGER DEFAULT NULL,
'visitor_id' INTEGER DEFAULT NULL,
'host' TEXT DEFAULT NULL,
'location' TEXT DEFAULT NULL
);
INSERT INTO "hits" ("id","time","visitor_id","host","location") VALUES ('1','1418219548','1','localhost','/some/path/example.php');
INSERT INTO "hits" ("id","time","visitor_id","host","location") VALUES ('2','1418219550','1','localhost','/some/path/example.php');
INSERT INTO "hits" ("id","time","visitor_id","host","location") VALUES ('3','1418219553','1','localhost','/some/path/example.php');
INSERT INTO "hits" ("id","time","visitor_id","host","location") VALUES ('4','1418219555','2','localhost','/some/path/example.php');
INSERT INTO "hits" ("id","time","visitor_id","host","location") VALUES ('5','1418219557','1','localhost','/some/path/example.php');
INSERT INTO "hits" ("id","time","visitor_id","host","location") VALUES ('6','1418219558','3','localhost','/some/path/example.php');
i now want to fetch the visitors data, but only from those who where active in the last 30 seconds for example. i need the following data as output, here with user id 1 as example:
$visitor = Array(
[id] => 1
[ip] => 1.2.3.4
[hash] => f9702c362aa9f1b05002804e3a65280b
[first_hit] => 1418219548
[last_hit] => 1418219557
[last_host] => localhost
[last_location] => /some/path/example.php
[total_hits] => 4
[idle_since] => 11
)
i'll get this with my current query, all good, but as you can see i need a lot of sub-selects for this:
SELECT
visitors.id,
visitors.ip,
visitors.hash,
(SELECT hits.time FROM hits WHERE hits.visitor_id = visitors.id ORDER BY hits.id ASC LIMIT 1) AS first_hit,
(SELECT hits.time FROM hits WHERE hits.visitor_id = visitors.id ORDER BY hits.id DESC LIMIT 1) AS last_hit,
(SELECT hits.host FROM hits WHERE hits.visitor_id = visitors.id ORDER BY hits.id DESC LIMIT 1) AS last_host,
(SELECT hits.location FROM hits WHERE hits.visitor_id = visitors.id ORDER BY hits.id DESC LIMIT 1) AS last_location,
(SELECT COUNT(hits.id) FROM hits WHERE hits.visitor_id = visitors.id) AS total_hits,
(SELECT strftime('%s','now') - hits.time FROM hits WHERE hits.visitor_id = visitors.id ORDER BY hits.id DESC LIMIT 1) AS idle_since
FROM visitors
WHERE idle_since < 30
ORDER BY last_hit DESC
so, is this ok for my use case or do you know a better approach to get this data out of those two tables? i already played around with JOINS, but no matter how i tweaked it, COUNT() gave me wrong outputs, like user id 1 has only one total hit for example.
i probably have to re-model the database, if i wanna use JOINS properly, i guess.
Update: based on AeroX' Answer i've built the new query. it basically had just one little bug. you can't have MAX() in a WHERE clause. using HAVING now after the GROUPING.
i also tested both the old and the new one with EXPLAIN and EXPLAIN QUERY PLAN. looks much better. Thank you guys!
SELECT
V.id,
V.ip,
V.hash,
MIN(H.time) AS first_hit,
MAX(H.time) AS last_hit,
strftime('%s','now') - MAX(H.time) AS idle_since,
COUNT(H.id) AS total_hits,
LH.host AS last_host,
LH.location AS last_location
FROM visitors AS V
INNER JOIN hits AS H ON (V.id = H.visitor_id)
INNER JOIN (
SELECT visitor_id, MAX(id) AS id
FROM hits
GROUP BY visitor_id
) AS L ON (V.id = L.visitor_id)
INNER JOIN hits AS LH ON (L.id = LH.id)
GROUP BY V.id, V.ip, V.hash, LH.host, LH.location
HAVING idle_since < 30
ORDER BY last_hit DESC
You probably want to clean this up but this should give you the idea of how to make the joins and how to use the GROUP BY statement to aggregate your hits table for each visitor. This should be more efficient then using lots of sub-queries.
I've included comments on the joins so that you can see why I'm making them.
SELECT
V.id,
V.ip,
V.hash,
MIN(H.time) AS first_hit,
MAX(H.time) AS last_hit,
COUNT(H.id) AS total_hits,
strftime('%s','now') - MAX(H.time) AS idle_since,
LH.host AS last_host,
LH.location AS last_location
FROM visitors AS V
-- Join hits table so we can calculate aggregates (MIN/MAX/COUNT)
INNER JOIN hits AS H ON (V.id = H.visitor_id)
-- Join a sub-query as a table which contains the most recent hit.id for each visitor.id
INNER JOIN (
SELECT visitor_id, MAX(id) AS id
FROM hits
GROUP BY visitor_id
) AS L ON (V.id = L.visitor_id)
-- Use the most recent hit.id for each visitor.id to fetch that most recent row (for last_host/last_location)
INNER JOIN hits AS LH ON (L.id = LH.id)
GROUP BY V.id, V.ip, V.hash, LH.host, LH.location
HAVING idle_since < 30
ORDER BY last_hit DESC
One of the best ways to measure query performance is using explain.
From sqlite
The EXPLAIN QUERY PLAN SQL command is used to obtain a high-level
description of the strategy or plan that SQLite uses to implement a
specific SQL query. Most significantly, EXPLAIN QUERY PLAN reports on
the way in which the query uses database indices. This document is a
guide to understanding and interpreting the EXPLAIN QUERY PLAN output.
Background information is available separately:
Notes on the query optimizer.
How indexing works.
The next generation query planner.
An EXPLAIN QUERY PLAN command returns zero or more rows of four
columns each. The column names are "selectid", "order", "from",
"detail". The first three columns contain an integer value. The final
column, "detail", contains a text value which carries most of the
useful information.
EXPLAIN QUERY PLAN is most useful on a SELECT statement, but may also
be appear with other statements that read data from database tables
(e.g. UPDATE, DELETE, INSERT INTO ... SELECT).
An example of an explain query is:
EXPLAIN SELECT * FROM COMPANY WHERE Salary >= 20000;
http://www.tutorialspoint.com/sqlite/sqlite_explain.htm
Below are more complex usage examples.
How can I analyse a Sqlite query execution?
I have a query
$query = "SELECT DISTINCT report_date,weekreportDate FROM contract_sales a
INNER JOIN contract b ON a.contract_UUID = b.UUID
INNER JOIN geoPoint c ON b.customer_UUID = c.customerUUID
WHERE c.com_UUID = '$com' AND a.report_date >= Date('$dateafter')
AND c.city_UUID = '$cit' ORDER BY `report_date`";
What I need to do is first get rid of all the results via date filtering but as you can see I get everything and then do my date sorting in the checks..
I am inner join all of them - is there a better way to do this?
I have a report for each date - and have two years of data - I want to get only dates in 2014 so as you can see I have 700+ dates that are useless to me right away but I have to go through all of them can check the other string UUID as well... what can I do to speed up my (working - albeit slow implementation)?
Explain information as requested:
Generation Time: Feb 20, 2014 at 06:48 PM
Generated by: phpMyAdmin 3.3.10.4 / MySQL 5.1.53-log
SQL query: EXPLAIN SELECT DISTINCT report_date,weekreportDate FROM contract_sales a INNER JOIN contract b ON a.contract_UUID = '1234' INNER JOIN geoPoint c ON b.customer_UUID = '1234' WHERE c.com_UUID = '1234' AND a.report_date >= Date('2014-01-01') AND c.city_UUID = '1234' ORDER BY `report_date`;
Rows: 3
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE a ref uuid_conlcs uuid_conlcs 110 const 1 Using where; Using temporary; Using filesort
1 SIMPLE b ref uuid_cust uuid_cust 110 const 1 Using where; Using index; Distinct
1 SIMPLE c ref uuid_gargp,uuid_citgp uuid_citgp 110 const 1 Using where; Distinct
First, a rewrite of your query. I would not suggest using aliases just a, b, c, but something closer to context of table "cs" for contract_sales, "con" for contract, and "gp" for geoPoint... especially easier in larger more complex queries.
Also, ALWAYS try to qualify your query with table.column (or alias.column) as the WeekReportDate is not clear, but it appears to be associated with your contract sales table.
As for indexes in this construct, I would have an index on (report_date, weekreportDate, contract_uuid). This way, its a covering index to handle the columns being retrieved, the where clause, order by, and the join to the contract table without having to go back to the raw data pages.
The contract table, I would have an index on ( UUID, customer_UUID), also to be a covering index for the join from contract sales, and also to support the join to the geoPoint table.
Finally, your geoPoint table, an index on ( customerUUID, com_uuid, city_uuid ) to also cover the join and your filtering criteria.
SELECT DISTINCT
cs.report_date,
cs.weekreportDate
FROM
contract_sales cs
INNER JOIN contract con
ON cs.contract_UUID = con.UUID
INNER JOIN geoPoint gp
ON con.customer_UUID = gp.customerUUID
AND gp.com_UUID = '$com'
AND gp.city_UUID = '$cit'
WHERE
cs.report_date >= Date('$dateafter')
ORDER BY
cs.report_date
Now that being said, and I don't know the makeup of your tables for volume, but if you are looking for stuff for a particular COM/City, I would suspect that records qualifying that would be a much smaller set than ALL COM/City for the date range in question. So, I would reverse the query as below hoping the smaller dataset might query faster, but you would have to obviously try both out.
SELECT DISTINCT
cs.report_date,
cs.weekreportDate
FROM
geoPoint gp
INNER JOIN contract con
ON gp.customerUUID = con.customer_UUID
JOIN contract_sales cs
ON con.UUID = cs.contract_UUID
AND cs.report_date >= Date('$dateafter')
WHERE
gp.com_UUID = '$com'
AND gp.city_UUID = '$cit'
ORDER BY
cs.report_date
Actually the geoPoint index should be on your WHERE criteria first, then the customer UUID for the join to the next table (com_uuid, city_uuid, customeruuid ). The contract_sales index on ( contract_UUID, report_date ), and the contract table index on (customer_UUID, UUID ) to match the flow of joins of this query.
I would like to seek some help in my query...i want to do is if specific atic and oaic is empty in the table...the interview_sum or other_sum to that specific atic oaic should be empty too....can anyone know how to do that?
picture of current output:
current query: my query still gives numbers to other_sum or interview_sum even its empty.
SELECT DISTINCT
IF(t.inttotal=NULL,0,(SELECT SUM(t2.inttotal)
FROM app_interview2 AS t2
WHERE t2.atic = t.atic AND t2.inttotal>0)/7)
AS interview_sum,
IF(o.ototal=NULL,0,(SELECT SUM(o2.ototal)
FROM other_app2 AS o2
WHERE o2.oaic = o.oaic AND o2.ototal>0)/7)
AS other_sum,
atid,
atic,
atname,
region,
town,
uniq_id,
position,
salary_grade,
salary
FROM app_interview2 AS t, other_app2 AS o
GROUP BY t.atname HAVING COUNT(DISTINCT t.atic)
I made a few assumptions:
You probably have a table that app_interview2.atic and other_app2.oaic are the foreign keys of, but since you did not share it, I derived a table in the FROM clause.
This assumes atname is always the same for atid.
You are also dividing by 7 - which I assume is to get the average, so I used the AVG function.
Solution---
SELECT t1.id AS atid
,interview.atname AS atname
,COALESCE(interview.interviewsum, 0) AS interviewsum
,COALESCE(interview.interviewavg,0) AS interviewavg
,COALESCE(other.othersum, 0) AS othersum
,COALESCE(other.otheravg) AS otheravg
FROM (SELECT DISTINCT atid AS id
FROM app_interview2
UNION
SELECT DISTINCT oaic
FROM other_app2) AS t1
LEFT JOIN (SELECT atid, atname, SUM(inttotal) AS interviewsum, AVG(inttotal) AS interviewavg
FROM app_interview2
GROUP BY atid, atname) as interview
ON interview.atid = t1.id
LEFT JOIN (SELECT oaic, SUM(ototal) AS othersum, AVG(ototal) AS otheravg
FROM other_app2
GROUP BY oaic) AS other
ON other.oaic = t1.id;
--
If this gives the results your were hoping for, I would replace the t1 derived table in the FROM clause with the table whose primary key I described above AND probably has those columns (e.g., region, town, etc) that I did not include