I have this query in MySQL. This query is taking too long to run, and I know the problem is the selectors (coalesce ((SELECT ...), I do not know how to speed up a query, via join.
I am hoping some of you SQL gurus will be able to help me.
SELECT
COALESCE(
(SELECT CONCAT(d.PRIJEVOZNIK, ' ', d.VOZAC_TRANSFER)
FROM dokum_zag as d
where d.SIFKNJ='NP' and
d.ID_VEZA=dokum_zag.ID and
d.korisnicko_ime=dokum_zag.korisnicko_ime
),'') as PRIJEVOZNIK,
(RELACIJA_TRANS_VOZ_TRANS) as RELACIJA_TRANS_VOZ,
(PRIJEVOZNIK_POVRATNI_TRANS) as PRIJEVOZNIK_POVRATNI,
(VAUC_KNJIZENO_TRANS) as VAUC_KNJIZENO,
ID_NALOGA,
ID_NALOGA_POV,
ID_VAUCHER,
DOLAZAK, VRIJ_TRANSFER,ODLAZAK,VRIJEME_LETA_POVRAT ,BRDOK, NOSITELJ_REZ, RELACIJA_TRANS, VOZILO_NAZIV, BROJ_NALOGA,BROJ_NAL_POV,BROJ_VAUCHER,BROJ_SOBE,VALIZN,PAX, MPIZN,ID
FROM
dokum_zag
WHERE
korisnicko_ime = '10' and
((DOLAZAK='2015-07-30') or (ODLAZAK='2015-07-30')) and
STORNO <> 'DA' and
SIFKNJ = 'TR' and
((YEAR(DOLAZAK)= '2015') or (YEAR(ODLAZAK)= '2015'))
order by
(CASE WHEN DOLAZAK < '2015-07-30' THEN ODLAZAK ELSE DOLAZAK END) ,
(CASE WHEN DOLAZAK < '2015-07-30' THEN VRIJEME_LETA_POVRAT ELSE VRIJ_TRANSFER END), ID
Without a DB structure, and a description of what you want to extract it's a bit hard to help you.
From a logical point of view, somethings are redundant, for example
((DOLAZAK='2015-07-30') or (ODLAZAK='2015-07-30')) and
...
((YEAR(DOLAZAK)= '2015') or (YEAR(ODLAZAK)= '2015'))
The year part isn't necessary since the year is specified on the first two.
Another thing that might be making the server nuts, is that weird order by clause, since it changes from record to record (test this setting it fixed on a field).
You can also check if your indexes are properly set for all the fields on the external where clause, and those which are not numeric, are not varchars (for example SIFKNJ and STORNO should be char(2) ).
The coalesce part can be solved via an outer join, so it doesn't get calculated on each row. But that depends on what and how you want to extract from the database... (since that subquery has it's own fields on the where section... weird)
Hope this somehow helps
INDEX(korisnicko_ime, SIFKNJ)
(in either order) may help
Turning the correlated subquery into a JOIN may help.
((DOLAZAK='2015-07-30') or (ODLAZAK='2015-07-30')) and
((YEAR(DOLAZAK)= '2015') or (YEAR(ODLAZAK)= '2015'))
is a bit weird. This might help:
( SELECT ...
AND DOLAZAK ='2015-07-30' AND ODLAZAK >= '2015-01-01'
AND ODLAZAK < '2015-01-01' + INTERVAL 1 YEAR
) UNION DISTINCT
( SELECT ...
AND ODLAZAK ='2015-07-30' AND DOLAZAK >= '2015-01-01'
AND DOLAZAK < '2015-01-01' + INTERVAL 1 YEAR
) ORDER BY ...
To help that reformulation, add 2 composite indexes:
INDEX(korisnicko_ime, SIFKNJ, DOLAZAK, ODLAZAK)
INDEX(korisnicko_ime, SIFKNJ, ODLAZAK, DOLAZAK)
Related
I'm having some problems with a query that finds the next ID of an orders with certain filters on it - Like it should be from a specific city, etc.
Currently it's used for a function, where it'll either spit out the previous or the next ID based on the current order. So it can either be min(id) or max(id), where max(id) is obviously faster, since it has to go through less rows.
The query is working just fine, but it's rather slow, since it's going through 123953 rows to find the ID. Is there any way I could optimize this?
Function example:
SELECT $minmax(orders.orders_id) AS new_id FROM orders LEFT JOIN orders_status ON orders.orders_status = orders_status.orders_status_id $where_join WHERE orders_status.language_id = '$languages_id' AND orders.orders_date_finished != '1900-01-01 00:00:00' AND orders.orders_id $largersmaller $ordersid $where;
Live example
SELECT min(orders.orders_id)
FROM orders
LEFT JOIN orders_status ON orders.orders_status = orders_status.orders_status_id
WHERE orders_status.language_id = '4'
AND orders.orders_date_finished != '1900-01-01 00:00:00'
AND orders.orders_id < 4868771
LIMIT 1
so concluding:
SELECT orders.orders_id
FROM orders
JOIN orders_status ON orders.orders_status = orders_status.orders_status_id
WHERE orders_status.language_id = '4'
AND orders.orders_date_finished != '1900-01-01 00:00:00'
AND orders.orders_id < 4868771
ORDER BY orders.orders_id ASC
LIMIT 1
Extra:
to get the MAX value, use DESC where ASC is now.
And looking at your question: be sure to escape the values like $language_id etcetera. I suppose they could come from some html form?
(or use prepared statements)
I am having some issues with this part of my query, hopefully someone can advise me, Currently i have the postcodes grouping as a complete postalcode but i would like to split this value, and group by the first part of postcode
Perhaps I am overlooking the simple solution to this?
Here is my current query
select *, count(jobline_pickup_add_postalcode) as totals
FROM rhrj_jobline
WHERE jobline_date >= '$start' AND jobline_date <= '$end'
GROUP BY jobline_pickup_add_postalcode
Any advice would be awesome
EDIT:
My postcodes look like this
KA12 0RA - 1
KA15 1JG - 26
KA15 2AT - 1
KA15 2LF - 1
KA151JG/PA2 6LA - 2
select SUBSTRING_INDEX(jobline_pickup_add_postalcode, ' ', 1) AS papc, count(jobline_pickup_add_postalcode) as totals FROM rhrj_jobline WHERE jobline_date >= '$start' AND jobline_date <= '$end' GROUP BY papc
I have a query given below
SELECT A.order_no, A.order_date,
COUNT(B.reaction_no) as tot_reaction_no,
SUM(CASE
WHEN (B.purification != '') THEN 1
ELSE 0
END) as tot_purification
FROM order_header A
LEFT JOIN order_reactions B ON A.order_no = B.order_no
WHERE A.order_date BETWEEN '2015-10-01 00:00:00' AND '2016-09-01 00:00:00'
AND A.order_no = '23746'
GROUP BY A.order_no
this will results as shown in the picture. But the result is wrong because some of the entries are duplicates. So I have to remove the duplicate and print the count. Count required is the count of "column" from the table 1.
I think you need to leave out the A.order_date in your select or you should add it to the group by clause. That gives you a different result though.
You may also use a subquery in your select clause:
SELECT A.order_no, A.order_date,
COUNT(B.reaction_no) as tot_reaction_no,
(SELECT count(*) FROM order_reactions as or WHERE or.order_no=A.order_no AND purification!='') as tot_purification,
(SELECT count(*) FROM order_reactions as or2 WHERE or.order_no=A.order_no) as tot_reaction_no
FROM order_header A
WHERE A.order_date BETWEEN '2015-10-01 00:00:00' AND '2016-09-01 00:00:00'
AND A.order_no = '23746'
This is just from the top of my head, since your screenshots are not showing the full tables I'm not sure this is 100% right, but it might point you in the right direction.
I would propose the query
SELECT COUNT(DISTINCT clone_name) AS tot_purification, COUNT(*) AS tot_reaction_no FROM Table2 WHERE `purification`='Column' AND `order_no`=23746;
Please excuse errors in quotation, MySQL is very confusing when it comes to quotation imo.
EDIT:
Added AS tot_purification
I'm not sure what you are expecting as tot_reaction_no, so I just counted all rows where Order and purification match as described in my WHERE clause.
Its because you are grouping only on A.order_no, make following changes in the query and try again:
Replace the line:
GROUP BY A.order_no;
to
GROUP BY A.order_no, A.clone_name;
After searching and reading a little bit I came up with the following SQL query for my application:
SELECT
ROUND(AVG(CASE WHEN gender = 'M' THEN rating END), 1) avgAllM,
COUNT(CASE WHEN gender = 'M' THEN rating END) countAllM,
ROUND(AVG(CASE WHEN gender = 'F' THEN rating END), 1) avgAllF,
COUNT(CASE WHEN gender = 'F' THEN rating END) countAllF,
ROUND(AVG(CASE WHEN gender = 'M' AND UserAge(birth_date) <= 18 THEN rating END), 1) avgU18M,
COUNT(CASE WHEN gender = 'M' AND UserAge(birth_date) <= 18 THEN rating END) countU18M,
ROUND(AVG(CASE WHEN gender = 'F' AND UserAge(birth_date) <= 18 THEN rating END), 1) avgU18F,
COUNT(CASE WHEN gender = 'F' AND UserAge(birth_date) <= 18 THEN rating END) countU18F
FROM movie_ratings mr INNER JOIN accounts a
ON mr.aid = a.aid
WHERE mid = 5;
And I'm wondering how can I simplify this, if possible. The birth_date field is of type DATE and UserAge is a function to calculate the age from that date field.
The table structures are as follows:
[ACCOUNTS]
aid(PK), birth_date, gender
[MOVIE_RATINGS]
mid(PK), aid(PK,FK), rating
I'm looking for two things:
General simplifications to the code above that more experienced users know about that I don't.
I'm doing this in PHP and for each record I'll have an associative array with all those variables. I'm looking for a way to group them into a multidimensional array, so the PHP code is easier to read. Of course I don't want to do this in PHP itself, it would be pointless.
For instance, something like this:
$info[0]['avgAllM']
$info[0]['countAllM']
$info[1]['avgAllF']
$info[1]['countAllF']
$info[2]['avgU18M']
$info[2]['countU18M']
$info[3]['avgU18F']
$info[3]['countU18F']
Instead of:
$info['avgAllM']
$info['countAllM']
$info['avgAllF']
$info['countAllF']
$info['avgU18M']
$info['countU18M']
$info['avgU18F']
$info['countU18F']
I don't even know if this is possible, so I'm really wondering if it is and how it can be done.
Why I want all this? Well, the SQL query above is just a fragment of the complete SQL I need to do. I haven't done it yet because before doing all the work, I want to know if there's a more compact SQL query to achieve the same result. Basically I'll add a few more lines like the ones above but with different conditions, specially on the date.
You could create a VIEW with the following definition
SELECT
CASE WHEN gender = 'M' THEN rating END AS AllM,
CASE WHEN gender = 'F' THEN rating END AS AllF,
CASE WHEN gender = 'M' AND UserAge(birth_date) <= 18 THEN rating END AS U18M,
CASE WHEN gender = 'F' AND UserAge(birth_date) <= 18 THEN rating END AS U18F
FROM movie_ratings mr INNER JOIN accounts a
ON mr.aid = a.aid
WHERE mid = 5
Then SELECT from that
SELECT ROUND(AVG(AllM), 1) avgAllM,
COUNT(AllM) countAllM,
ROUND(AVG(AllF), 1) avg,
COUNT(AllF) countAllF,
ROUND(AVG(U18M), 1) avgU18M,
COUNT(U18M) countU18M,
ROUND(AVG(U18F), 1) avgU18F,
COUNT(U18F) countU18F
FROM yourview
Might simplify things slightly?
This could just be a case of optimizing too early. The query does what you need and only really looks complicated because it is. I'm not sure that there are necessarily any tricks that would help. It probably depends on the characteristics of your data. Is the query slow? Do you think it could be quicker?
It might be worth rearranging it in the following way. Since all the conditions rely on the ACCOUNTS table which I assume is going to be significantly smaller than the MOVIE_RATINGS table you might be able to do all the calculations on a smaller data set, which might be quicker. Although if you are only selecting on one movie at a time (mid = 5) then that probably won't be the case.
I'm not entirely sure that this will work but think it should.
SELECT
ROUND(AVG(rating * AllM), 1) avgAllM,
COUNT(rating * AllM) countAllM,
ROUND(AVG(rating * AllF), 1) avgAllF,
COUNT(rating * AllF) countAllF,
ROUND(AVG(rating * AllM * U18), 1) avgU18M,
COUNT(rating * AllM * U18) countU18M,
ROUND(AVG(rating * AllM * U18), 1) avgU18F,
COUNT(rating * AllM * U18) countU18F
FROM
movie_ratings mr
INNER JOIN (
select
aid,
case when gender = 'M' then 1 end as AllM,
case when gender = 'F' then 1 end as AllF,
case when UserAge(birth_date) <= 18 then 1 end as U18
from accounts) a ON mr.aid = a.aid
WHERE mid = 5;
On balance though, I would probably just leave the query you have as it is. The query that you have is easy to understand and probably performs fairly well.
I have this query below. There are 4 main tables involved: tblOrder, tblItems, tblOrder_archive, tblItem_archive. Orders and Items get moved over to the archived versions of the tables after a few months as not to slow down the main table queries. (sales and traffic is REALLY HIGH). So to get sales figures, i select what i need from each set of tables (archive and non archive).. union them.. do a group by on the union.. then do some math on the result.
Problem is that with any significant amount of rows (the order time span).. it will take so long for the query to run that it times out. I have added all the keys I can think of and still running super slow.
Is there more I can do to make this run faster? Can i write it differently? Can i use different indexes?
or should i write a script that gets the data from each table set first then does the math in the php script to combine them?
Thanks for the help.
SELECT
description_invoice
, supplier
, type
, sum(quantity) AS num_sold
, sum(quantity*wholesale) AS wholesale_price
, sum(quantity*price) AS retail_price
, sum(quantity*price) - sum(quantity*wholesale) AS profit
FROM (
SELECT
tblOrder.*
, tblItem.description_invoice
, tblItem.type
, tblItem.product_number
, tblItem.quantity
, tblItem.wholesale
, tblItem.price
, tblItem.supplier
FROM tblOrder USE KEY (finalized), tblItem
WHERE
tblItem.order_id = tblOrder.order_id
AND
finalized=1
AND
wholesale <> 0
AND (order_time >= 1251788400 AND order_time <= 1283669999)
UNION
SELECT
tblOrder_archive.*
, tblItem_archive.description_invoice
, tblItem_archive.type
, tblItem_archive.product_number
, tblItem_archive.quantity
, tblItem_archive.wholesale
, tblItem_archive.price
, tblItem_archive.supplier
FROM tblOrder_archive USE KEY (finalized), tblItem_archive
WHERE
tblItem_archive.order_id=tblOrder_archive.order_id
AND
finalized=1
AND
wholesale <> 0
AND (order_time >= 1251788400 AND order_time <= 1283669999)
) AS main_table
GROUP BY
description_invoice
, supplier,type
ORDER BY profit DESC;
Create indexes on the columns you are using in the WHERE clauses.
Remove the index hint: USE KEY (finalized). If it does anything at all it will probably just make it slower by causing MySQL to choose this key instead of a potentially better key.
Add a LIMIT to avoid fetching too many rows. Use paging if you want to see more rows.
Use UNION ALL instead of UNION. This will be faster because it doesn't check for duplicates and also you probably don't want to remove duplicates here anyway since this will affect the total.
Orders and Items get moved over to the archived versions of the tables after a few months as not to slow down the main table queries.
This is probably a bad idea. Instead you should index your data correctly so that the queries don't become significantly slower when you add more data. Or alternatively you could look at partitioning the table.
I re-wrote your query as:
SELECT COALESCE(x.description_invoice, y.description_invoice) AS description_invoice,
COALESCE(x.supplier, y.supplier) AS supplier,
COALESCE(x.type, y.type) AS type,
COALESCE(SUM(x.quantity), 0) + COALESCE(SUM(y.quantity), 0) as num_sold,
COALESCE(SUM(x.quantity * x.wholesale), 0) + COALESCE(SUM(y.quantity * y.wholesale), 0) AS wholesale_price,
COALESCE(SUM(x.quantity * x.price), 0) + COALESCE(SUM(y.quantity * y.price), 0) AS retail_price,
COALESCE(SUM(x.quantity * x.price), 0) - COALESCE(SUM(x.quantity * x.wholesale), 0) + COALESCE(SUM(y.quantity * y.price), 0) - COALESCE(SUM(y.quantity * y.wholesale), 0) as profit
FROM (SELECT o.order_id
FROM TBLORDER o
WHERE o.finalized = 1
AND o.order_time BETWEEN 1251788400
AND 1283669999
UNION ALL
SELECT oa.order_id
FROM TBLORDER_ARCHIVE oa
WHERE oa.finalized = 1
AND oa.order_time BETWEEN 1251788400
AND 1283669999) a
LEFT JOIN TBLITEM x ON x.order_id = a.order_id
AND x.wholesale != 0
LEFT JOIN TBLITEM_ARCHIVE y ON y.order_id = a.order_id
AND y.wholesale != 0
GROUP BY description_invoice, supplier, type
ORDER BY profit DESC
Your query had UNION, but I'd expect not to need duplicate removal from an archive table so I changed it to UNION ALL - which is faster, because it doesn't remove duplicates
For what you provided, you had SELECT ORDERS.* and SELECT ORDER_ARCHIVE.* but never used any of the columns.
The aggregation functions (SUM) were all on the TBLITEM table, which was unnecessarily within the derived table/inline view.
I omitted the USE KEY(finalized); you can re-add it if you like but I'd compare with and with out it - I'd suggest running ANALYZE TABLE occaissionally on both tables prior to running the query so the optimizer has relatively fresh statistics.
I don't see much value in an index on the finalized column, but I don't know your data or use - just this query. But based on this query, I'd index:
order_id
order_time
finalized
...as a covering index--a single index with three columns, in the order provided because order is important in a covering index.
I rewrote it as follows based on your help, and added the recommended covering index to both tblOrder and tblOrder archive and things seem to be much faster. But still i'm wondering if there something more to the way you wrote it.. but i would need to use tblItem_archive joined to tblOrder_archive as well.
SELECT
description_invoice
, supplier
, type
, sum(quantity) AS num_sold
, sum(quantity*wholesale) AS wholesale_price
, sum(quantity*price) AS retail_price
, sum(quantity*price) - sum(quantity*wholesale) AS profit
FROM (
SELECT
tblOrder.order_id
, tblItem.description_invoice
, tblItem.type
, tblItem.product_number
, tblItem.quantity
, tblItem.wholesale
, tblItem.price
, tblItem.supplier
FROM tblOrder, tblItem
WHERE
tblItem.order_id = tblOrder.order_id
AND
finalized=1
AND
wholesale <> 0
AND (order_time >= 1251788400 AND order_time <= 1283669999)
UNION ALL
SELECT
tblOrder_archive.order_id
, tblItem_archive.description_invoice
, tblItem_archive.type
, tblItem_archive.product_number
, tblItem_archive.quantity
, tblItem_archive.wholesale
, tblItem_archive.price
, tblItem_archive.supplier
FROM tblOrder_archive, tblItem_archive
WHERE
tblItem_archive.order_id=tblOrder_archive.order_id
AND
finalized=1
AND
wholesale <> 0
AND (order_time >= 1251788400 AND order_time <= 1283669999)
) AS main_table
GROUP BY
description_invoice
, supplier,type
ORDER BY profit DESC;