I have this query below. There are 4 main tables involved: tblOrder, tblItems, tblOrder_archive, tblItem_archive. Orders and Items get moved over to the archived versions of the tables after a few months as not to slow down the main table queries. (sales and traffic is REALLY HIGH). So to get sales figures, i select what i need from each set of tables (archive and non archive).. union them.. do a group by on the union.. then do some math on the result.
Problem is that with any significant amount of rows (the order time span).. it will take so long for the query to run that it times out. I have added all the keys I can think of and still running super slow.
Is there more I can do to make this run faster? Can i write it differently? Can i use different indexes?
or should i write a script that gets the data from each table set first then does the math in the php script to combine them?
Thanks for the help.
SELECT
description_invoice
, supplier
, type
, sum(quantity) AS num_sold
, sum(quantity*wholesale) AS wholesale_price
, sum(quantity*price) AS retail_price
, sum(quantity*price) - sum(quantity*wholesale) AS profit
FROM (
SELECT
tblOrder.*
, tblItem.description_invoice
, tblItem.type
, tblItem.product_number
, tblItem.quantity
, tblItem.wholesale
, tblItem.price
, tblItem.supplier
FROM tblOrder USE KEY (finalized), tblItem
WHERE
tblItem.order_id = tblOrder.order_id
AND
finalized=1
AND
wholesale <> 0
AND (order_time >= 1251788400 AND order_time <= 1283669999)
UNION
SELECT
tblOrder_archive.*
, tblItem_archive.description_invoice
, tblItem_archive.type
, tblItem_archive.product_number
, tblItem_archive.quantity
, tblItem_archive.wholesale
, tblItem_archive.price
, tblItem_archive.supplier
FROM tblOrder_archive USE KEY (finalized), tblItem_archive
WHERE
tblItem_archive.order_id=tblOrder_archive.order_id
AND
finalized=1
AND
wholesale <> 0
AND (order_time >= 1251788400 AND order_time <= 1283669999)
) AS main_table
GROUP BY
description_invoice
, supplier,type
ORDER BY profit DESC;
Create indexes on the columns you are using in the WHERE clauses.
Remove the index hint: USE KEY (finalized). If it does anything at all it will probably just make it slower by causing MySQL to choose this key instead of a potentially better key.
Add a LIMIT to avoid fetching too many rows. Use paging if you want to see more rows.
Use UNION ALL instead of UNION. This will be faster because it doesn't check for duplicates and also you probably don't want to remove duplicates here anyway since this will affect the total.
Orders and Items get moved over to the archived versions of the tables after a few months as not to slow down the main table queries.
This is probably a bad idea. Instead you should index your data correctly so that the queries don't become significantly slower when you add more data. Or alternatively you could look at partitioning the table.
I re-wrote your query as:
SELECT COALESCE(x.description_invoice, y.description_invoice) AS description_invoice,
COALESCE(x.supplier, y.supplier) AS supplier,
COALESCE(x.type, y.type) AS type,
COALESCE(SUM(x.quantity), 0) + COALESCE(SUM(y.quantity), 0) as num_sold,
COALESCE(SUM(x.quantity * x.wholesale), 0) + COALESCE(SUM(y.quantity * y.wholesale), 0) AS wholesale_price,
COALESCE(SUM(x.quantity * x.price), 0) + COALESCE(SUM(y.quantity * y.price), 0) AS retail_price,
COALESCE(SUM(x.quantity * x.price), 0) - COALESCE(SUM(x.quantity * x.wholesale), 0) + COALESCE(SUM(y.quantity * y.price), 0) - COALESCE(SUM(y.quantity * y.wholesale), 0) as profit
FROM (SELECT o.order_id
FROM TBLORDER o
WHERE o.finalized = 1
AND o.order_time BETWEEN 1251788400
AND 1283669999
UNION ALL
SELECT oa.order_id
FROM TBLORDER_ARCHIVE oa
WHERE oa.finalized = 1
AND oa.order_time BETWEEN 1251788400
AND 1283669999) a
LEFT JOIN TBLITEM x ON x.order_id = a.order_id
AND x.wholesale != 0
LEFT JOIN TBLITEM_ARCHIVE y ON y.order_id = a.order_id
AND y.wholesale != 0
GROUP BY description_invoice, supplier, type
ORDER BY profit DESC
Your query had UNION, but I'd expect not to need duplicate removal from an archive table so I changed it to UNION ALL - which is faster, because it doesn't remove duplicates
For what you provided, you had SELECT ORDERS.* and SELECT ORDER_ARCHIVE.* but never used any of the columns.
The aggregation functions (SUM) were all on the TBLITEM table, which was unnecessarily within the derived table/inline view.
I omitted the USE KEY(finalized); you can re-add it if you like but I'd compare with and with out it - I'd suggest running ANALYZE TABLE occaissionally on both tables prior to running the query so the optimizer has relatively fresh statistics.
I don't see much value in an index on the finalized column, but I don't know your data or use - just this query. But based on this query, I'd index:
order_id
order_time
finalized
...as a covering index--a single index with three columns, in the order provided because order is important in a covering index.
I rewrote it as follows based on your help, and added the recommended covering index to both tblOrder and tblOrder archive and things seem to be much faster. But still i'm wondering if there something more to the way you wrote it.. but i would need to use tblItem_archive joined to tblOrder_archive as well.
SELECT
description_invoice
, supplier
, type
, sum(quantity) AS num_sold
, sum(quantity*wholesale) AS wholesale_price
, sum(quantity*price) AS retail_price
, sum(quantity*price) - sum(quantity*wholesale) AS profit
FROM (
SELECT
tblOrder.order_id
, tblItem.description_invoice
, tblItem.type
, tblItem.product_number
, tblItem.quantity
, tblItem.wholesale
, tblItem.price
, tblItem.supplier
FROM tblOrder, tblItem
WHERE
tblItem.order_id = tblOrder.order_id
AND
finalized=1
AND
wholesale <> 0
AND (order_time >= 1251788400 AND order_time <= 1283669999)
UNION ALL
SELECT
tblOrder_archive.order_id
, tblItem_archive.description_invoice
, tblItem_archive.type
, tblItem_archive.product_number
, tblItem_archive.quantity
, tblItem_archive.wholesale
, tblItem_archive.price
, tblItem_archive.supplier
FROM tblOrder_archive, tblItem_archive
WHERE
tblItem_archive.order_id=tblOrder_archive.order_id
AND
finalized=1
AND
wholesale <> 0
AND (order_time >= 1251788400 AND order_time <= 1283669999)
) AS main_table
GROUP BY
description_invoice
, supplier,type
ORDER BY profit DESC;
Related
I would like to better optimize my code. I'd like to have a single query that allows an alias name to have it's own limit and also include a result with no limit.
Currently I'm using two queries like this:
// ALL TIME //
$mikep = mysqli_query($link, "SELECT tasks.EID, reports.how_did_gig_go FROM tasks INNER JOIN reports ON tasks.EID=reports.eid WHERE `priority` IS NOT NULL AND `partners_name` IS NOT NULL AND mike IS NOT NULL GROUP BY EID ORDER BY tasks.show_date DESC;");
$num_rows_mikep = mysqli_num_rows($mikep);
$rating_sum_mikep = 0;
while ($row = mysqli_fetch_assoc($mikep)) {
$rating_mikep = $row['how_did_gig_go'];
$rating_sum_mikep += $rating_mikep;
}
$average_mikep = $rating_sum_mikep/$num_rows_mikep;
// AND NOW WITH A LIMIT 10 //
$mikep_limit = mysqli_query($link, "SELECT tasks.EID, reports.how_did_gig_go FROM tasks INNER JOIN reports ON tasks.EID=reports.eid WHERE `priority` IS NOT NULL AND `partners_name` IS NOT NULL AND mike IS NOT NULL GROUP BY EID ORDER BY tasks.show_date DESC LIMIT 10;");
$num_rows_mikep_limit = mysqli_num_rows($mikep_limit);
$rating_sum_mikep_limit = 0;
while ($row = mysqli_fetch_assoc($mikep_limit)) {
$rating_mikep_limit = $row['how_did_gig_go'];
$rating_sum_mikep_limit += $rating_mikep_limit;
}
$average_mikep_limit = $rating_sum_mikep_limit/$num_rows_mikep_limit;
This allows me to show an all-time average and also an average over the last 10 reviews. Is it really necessary for me to set up two queries?
Also, I understand I could get the sum in the query, but not all the values are numbers, so I've actually converted them in PHP, but left out that code in order to try and simplify what is displayed in the code.
All-time average and average over the last 10 reviews
In the best case scenario, where your column how_did_gig_go was 100% numeric, a single query like this could work like so:
SELECT
AVG(how_did_gig_go) AS avg_how_did_gig_go
, SUM(CASE
WHEN rn <= 10 THEN how_did_gig_go
ELSE 0
END) / 10 AS latest10_avg
FROM (
SELECT
#num + 1 AS rn
, tasks.show_date
, reports.how_did_gig_go
FROM tasks
INNER JOIN reports ON tasks.EID = reports.eid
CROSS JOIN ( SELECT #num := 0 AS n ) AS v
WHERE priority IS NOT NULL
AND partners_name IS NOT NULL
AND mike IS NOT NULL
ORDER BY tasks.show_date DESC
) AS d
But; Unless all the "numbers" are in fact numeric you are doomed to sending every row back from the server for php to process unless you can clean-up the data in MySQL somehow.
You might avoid sending all that data twice if you establish a way for your php to use only the top 10 from the whole list. There are probably way of doing that in PHP.
If you wanted assistance in SQL to do that, then maybe having 2 columns would help, it would reduce the number of table scans.
SELECT
EID
, how_did_gig_go
, CASE
WHEN rn <= 10 THEN how_did_gig_go
ELSE 0
END AS latest10_how_did_gig_go
FROM (
SELECT
#num + 1 AS rn
, tasks.EID
, reports.how_did_gig_go
FROM tasks
INNER JOIN reports ON tasks.EID = reports.eid
CROSS JOIN ( SELECT #num := 0 AS n ) AS v
WHERE priority IS NOT NULL
AND partners_name IS NOT NULL
AND mike IS NOT NULL
ORDER BY tasks.show_date DESC
) AS d
In future (MySQL 8.x) ROW_NUMBER() OVER(order by tasks.show_date DESC) would be a better method than the "roll your own" row numbering (using #num+1) shown before.
I'm having some problems with a query that finds the next ID of an orders with certain filters on it - Like it should be from a specific city, etc.
Currently it's used for a function, where it'll either spit out the previous or the next ID based on the current order. So it can either be min(id) or max(id), where max(id) is obviously faster, since it has to go through less rows.
The query is working just fine, but it's rather slow, since it's going through 123953 rows to find the ID. Is there any way I could optimize this?
Function example:
SELECT $minmax(orders.orders_id) AS new_id FROM orders LEFT JOIN orders_status ON orders.orders_status = orders_status.orders_status_id $where_join WHERE orders_status.language_id = '$languages_id' AND orders.orders_date_finished != '1900-01-01 00:00:00' AND orders.orders_id $largersmaller $ordersid $where;
Live example
SELECT min(orders.orders_id)
FROM orders
LEFT JOIN orders_status ON orders.orders_status = orders_status.orders_status_id
WHERE orders_status.language_id = '4'
AND orders.orders_date_finished != '1900-01-01 00:00:00'
AND orders.orders_id < 4868771
LIMIT 1
so concluding:
SELECT orders.orders_id
FROM orders
JOIN orders_status ON orders.orders_status = orders_status.orders_status_id
WHERE orders_status.language_id = '4'
AND orders.orders_date_finished != '1900-01-01 00:00:00'
AND orders.orders_id < 4868771
ORDER BY orders.orders_id ASC
LIMIT 1
Extra:
to get the MAX value, use DESC where ASC is now.
And looking at your question: be sure to escape the values like $language_id etcetera. I suppose they could come from some html form?
(or use prepared statements)
I’m designing a program for my school to keep student attendance records. So far I have the following query working fine and now I would like to add an IF statement to perform a percentage operation when a certain condition is given. As it is, the query is using INNER JOIN to search for data from two different tables (oxadmain and stuattend) and it’s displaying the results well on a results table:
SELECT o.name
, o.year
, o.photoID
, o.thumbs
, s.ID
, s.studid
, s.date
, s.teacher
, s.subject
, s.attendance
FROM stuattend s
JOIN oxadmain o
ON s.studid = o.stuid
ORDER
BY name ASC
Now I would like to add an “if” statement that
1) finds when stuattend.attendance is = Absent, calculates the percentage of absences the students may have in any given period of time, and then stores that (%) value in “percentage” and
2) ELSE assigns the value of 100% to “Percentage”.
So far I’ve been trying with the following:
<?php $_GET['studentID'] = $_row_RepeatedRS['WADAstuattend']; ?>
SELECT oxadmain.name , oxadmain.year , oxadmain.photoID , oxadmain.thumbs , stuattend.ID , stuattend.studid , stuattend.date , stuattend.teacher, stuattend.subject , stuattend.attendance
CASE
WHEN stuattend.attendance = Absent THEN SELECT Count (studentID) AS ClassDays, (SELECT Count(*) FROM stuattend WHERE studentID = stuattend.studid AND Absent = 1) AS ClassAbsent, ROUND ((ClassAbsent/ClassDays)*100, 2) AS Percentage
ELSE
Percentage = 100
END
FROM stuattend INNER JOIN oxadmain ON stuattend.studid=oxadmain.stuid
ORDER BY name ASC
Any suggestions on how to do this well?
Thank you for your attention
The base idea would be:
select stuattend.studid, sum(stuattend.attendance = `absent`) / count(*)
from stuattend
group by stuaddend.studid;
This very much depends on exactly one entry per student and per day, and of course gets 0 if no absence and 1 if always absent.
To make this a bit more stable I would suggest to write a calendar day table, which simply keeps a list of all days and a column if this is a school day, so workday=1 means they should have been there and workday=0 means sunday or holiday. Then you could left join from this table to the presence and absence days, and even would give good results when presence is not contained in your table.
Just ask if you decide which way to go.
I'm working with the join plus union plus group by query, and I developed a query something like mentioned below:
SELECT *
FROM (
(SELECT countries_listing.id,
countries_listing.country,
1 AS is_country
FROM countries_listing
LEFT JOIN product_prices ON (product_prices.country_id = countries_listing.id)
WHERE countries_listing.status = 'Yes'
AND product_prices.product_id = '3521')
UNION
(SELECT countries_listing.id,
countries_listing.country,
0 AS is_country
FROM countries_listing
WHERE countries_listing.id NOT IN
(SELECT country_id
FROM product_prices
WHERE product_id='3521')
AND countries_listing.status='Yes')) AS partss
GROUP BY id
ORDER BY country
And I just realised that this query is taking a lot of time to load results, almost 8 seconds.
I was wondering if there is the possibility to optimize this query to the fastest one?
If I understand the logic correctly, you just want to add a flag for the country as to whether or not there is a price for a given product. I think you can use an exists clause to get what you want:
SELECT cl.id, cl.country,
(exists (SELECT 1
FROM product_prices pp
WHERE pp.country_id = cl.id AND
pp.product_id = '3521'
)
) as is_country
FROM countries_listing cl
WHERE cl.status = 'Yes'
ORDER BY country;
For performance, you want two indexes: countries_listing(status, country) and
product_prices(country_id, product_id)`.
Depending on how often it is executed, prepared statements could help. See PDO for more information.
I have a table in a MySQL database (level_records) which has 3 columns (id, date, reading). I want to put the differences between the most recent 20 readings (by date) into an array and then average them, to find the average difference.
I have looked everywhere, but no one seems to have a scenario quite like mine.
I will be very grateful for any help. Thanks.
SELECT AVG(difference)
FROM (
SELECT #next_reading - reading AS difference, #next_reading := reading
FROM (SELECT reading
FROM level_records
ORDER BY date DESC
LIMIT 20) AS recent20
CROSS JOIN (SELECT #next_reading := NULL) AS var
) AS recent_diffs
DEMO
If we consider "differences" to be signed, and if we ignore/exclude any rows that have a NULL values of reading...
If you want to return just the values of the difference between a reading and the immediately preceding reading (to get the latest nineteen differences), then you could do something like this:
SELECT d.diff
FROM ( SELECT e.reading - #prev_reading AS diff
, #prev_reading AS prev_reading
, #prev_reading := e.reading AS reading
FROM ( SELECT r.date
, r.reading
FROM level_records r
CROSS
JOIN (SELECT #prev_reading := NULL) p
ORDER BY r.date DESC
LIMIT 20
) e
ORDER BY e.date ASC
) d
That'll get you the rows returned from MySQL and you can monkey with them in PHP however you want. (The question of how to monkey around with arrays in PHP is a question that doesn't really have anything to do with MySQL.)
If you want to know how to return rows from a SQL resultset into a PHP array, that doesn't really have anything to do with "latest twenty", "difference", or "average" at all. You'd use the same pattern you'd use for returning the result from any query. There's nothing at all unique about that, there are plenty of examples of that, (most of them unfortunately using the deprecated mysql_ interface; for new development, you want to use either PDO or mysqli_.
If you mean by "all 19 sets of differences" that you want to get the difference between a reading and every other reading, and do that for each reading, such that you get a total of 380 rows ( = 20 * (20-1) rows ) then:
SELECT a.reading - b.reading AS diff
, a.id AS a_id
, a.date AS a_date
, a.reading AS a_reading
, b.id AS b_id
, b.date AS b_date
, b.reading AS b_reading
FROM ( SELECT aa.id
, aa.date
, aa.reading
FROM level_record aa
WHERE aa.reading IS NOT NULL
ORDER BY aa.date DESC, aa.id DESC
LIMIT 20
) a
JOIN ( SELECT bb.id
, bb.date
, bb.reading
FROM level_record bb
WHERE bb.reading IS NOT NULL
ORDER BY bb.date DESC, bb.id DESC
LIMIT 20
) b
WHERE a.id <> b.id
ORDER BY a.date DESC, b.date DESC
Sometimes, we only want differences in one direction, that is, if we have the difference between r13 and r15, we essentially already have the inverse, the difference between r15 and f13. And sometimes, it's more convenient to have the inverse copies.
What query you run really depends on what result set you want returned.
If the goal is to get an "average", then rather than monkeying with PHP arrays, we know that the average of the differences between the latest twenty readings will be the same as the difference between the first and last readings (in the latest twenty), divided by nineteen.
If we only want to return a row if there are at least twenty readings available, then something like this:
SELECT (l.reading - f.reading)/19 AS avg_difference
FROM ( SELECT ll.reading
FROM level_records ll
WHERE ll.reading IS NOT NULL
ORDER BY ll.date DESC LIMIT 1
) l
CROSS
JOIN (SELECT ff.reading
FROM level_records ff
WHERE ff.reading IS NOT NULL
ORDER BY ff.date DESC LIMIT 19,1
) f
NOTE: That query will only return a row only if there are at least twenty rows with non-NULL values of reading in the level_records table.
For the more general case, if there are fewer than twenty rows in the table (i.e. fewer than nineteen differences) and we want an average of the differences between the latest available rows, we can do something like this:
SELECT (l.reading - f.reading)/f.cnt AS avg_difference
FROM ( SELECT ll.reading
FROM level_records ll
WHERE ll.reading IS NOT NULL
ORDER BY ll.date DESC
LIMIT 1
) l
CROSS
JOIN (SELECT ee.reading
, ee.cnt
FROM ( SELECT e.date
, e.reading
, (#i := #i + 1) AS cnt
FROM level_records e
CROSS
JOIN (SELECT #i := -1) i
WHERE e.reading IS NOT NULL
ORDER BY e.date DESC
LIMIT 20
) ee
ORDER BY ee.date ASC
LIMIT 1
) f
But, if we need to treat "differences" as unsigned (that is, we are taking the absolute value of the differences between the readings),
then we'd need to get the actual differences between the readings, and then average the absolute values of the differences...
then we could do make use of a MySQL user variable to keep track of the "previous" reading, and have that available when we process the next row, so we can get the difference between them, something like this:
SELECT AVG(d.abs_diff)
FROM ( SELECT ABS(e.reading - #prev_reading) AS abs_diff
, #prev_reading AS prev_reading
, #prev_reading := e.reading AS reading
FROM ( SELECT r.date
, r.reading
FROM level_records r
CROSS
JOIN (SELECT #prev_reading := NULL) p
ORDER BY r.date DESC
LIMIT 20
) e
ORDER BY e.date ASC
) d