I'm trying to optimize a report query run on an ecommerce site. I'm pretty sure that I'm doing something stupid, since this query shouldn't be taking nearly as long to run as it does.
The query in question is:
SELECT inventories_name, inventories_code, SUM(shop_orders_inventories_qty) AS qty,
SUM(shop_orders_inventories_price) AS tot_price, inventories_categories_name,
inventories_price_list, inventories_id
FROM shop_orders
LEFT JOIN shop_orders_inventories ON (shop_orders_id = join_shop_orders_id)
LEFT JOIN inventories ON (join_inventories_id = inventories_id)
WHERE {$date_type} BETWEEN '{$start_date}' AND '{$end_date}'
AND shop_orders_x_response_code = 1
GROUP BY join_inventories_id, join_shop_categories_id
{$order}
{$limit}
It's basically trying to get total sales per item over a period of time; values in curly brackets are filled in via a form. It works fine for a period of a couple days, but querying a time interval of a week or more can take 30 seconds+.
I feel like it's joining way too many rows in order to calculate the aggregate values and sucking up huge amounts of memory, but I'm not sure how to limit it.
Note - I realize that I'm selecting fields which aren't in the group by, but they correspond 1-1 with inventory ID, which is in the group by.
Any suggestions?
-- Edit --
The current indices are:
inventories:
join_categories - BTREE
inventories_name, inventories_code, inventories_description - FULLTEXT
shop_orders_inventories:
shop_orders_inventories_id - BTREE
shop_orders:
shop_orders_id - BTREE
Two sequential left joins will work quite long on a big table. Try to use "join" instead of "left join" (unless you have records in shop_orders with now matching records in shop_orders_inventories or inventories) or split this query to couple of small ones. Also by using "sum" and "group by" you are forcing MySQL to create temp tables - you might want to increase MySQL cache so those tables would fit in to memory (otherwise MySQL will dump them to disk which will also increase SQL execution time).
The first and foremost rule to indexing is... index the columns that you will search on!
For each possible value of {$date_type}, create an index for that date column.
Once you have lots of data in the table (say 2 years or 100 weeks), a single week's data is 1% of the index, so it becomes a good starting point.
Even though MySQL allows non-aggregates in the SELECT clause, I personally would sync the two
SELECT inventories_name, inventories_code,
SUM(shop_orders_inventories_qty) AS qty,
SUM(shop_orders_inventories_price) AS tot_price,
inventories_categories_name, inventories_price_list, inventories_id
FROM ...
GROUP BY inventories_id, join_shop_categories_id, inventories_name,
inventories_code, inventories_categories_name, inventories_price_list
...
Related
i want to fetch records from mysql starting from last to first LIMIT 20. my database have over 1M records. I am aware of order by. but from my understanding when using order by its taking forever to load 20 records i have no freaking idea. but i think mysql fetch all the records before ordering.
SELECT bookings.created_at, bookings.total_amount,
passengers.name, passengers.id_number, payments.amount,
passengers.ticket_no,bookings.phone,bookings.source,
bookings.destination,bookings.date_of_travel FROM bookings
INNER JOIN passengers ON bookings.booking_id = passengers.booking_id
INNER JOIN payments on payments.booking_id = bookings.booking_id
ORDER BY bookings.booking_id DESC LIMIT 10
I suppose if you execute the query without the order by the time would be satisfactory?
You might try to create an index in the column your are ordering:
create index idx_bookings_booking_id on bookings(booking_id)
You can try to find out complexity of the Query using
EXPLAIN SELECT bookings.created_at, bookings.total_amount,
passengers.name, passengers.id_number, payments.amount,
passengers.ticket_no,bookings.phone,bookings.source,
bookings.destination,bookings.date_of_travel FROM bookings
INNER JOIN passengers ON bookings.booking_id = passengers.booking_id
INNER JOIN payments on payments.booking_id = bookings.booking_id
ORDER BY bookings.booking_id DESC LIMIT 10
then check the proper index has been created on the table
SHOW INDEX FROM `db_name`.`table_name`;
if the index us not there create proper index on all the table
please add if anything is missing
The index lookup table needs to be able to reside in memory, if I'm not mistaken (filesort is much slower than in-mem lookup).
Use small index / column size
For a double in capacity use UNSIGNED columns if you need no negative values..
Tune sort_buffer_size and read_rnd_buffer_size (maybe better on connection level, not global)
See https://dev.mysql.com/doc/refman/5.7/en/order-by-optimization.html , particularly regarding using EXPLAIN and the maybe trying another execution plan strategy.
You seem to need another workaround like materialized views.
Tell me if this sounds like it:
Create another table like the booking table e.g. CREATE TABLE booking_short LIKE booking. Though you only need the booking_id column
And check your code for where exactly you create booking orders, e.g. where you first insert into booking. SELECT COUNT(*) FROM booking_short. If it is >20, delete the first record. Insert the new booking_id.
You can select the ID and join from there before joining for more details with the rest of the tables.
You won't need limit or sorting.
Of course, this needs heavy documentation to avoid maintenance problems.
Either that or https://stackoverflow.com/a/5912827/6288442
Say I have a table with 1million rows. One column lists the "Group", and another lists "Sales". The Group #'s range from 1 to 100,000 such that each Group has about 10 Sales entries. I want to somehow summarize the data into 100,000 rows with the sum of Sales for each group rather than each individual sale.
My method so far has been to run a PHP loop from 1 to 100,000 where each iteration sends an SQL query to sum(Sales) WHERE Group=$i. Then I can either echo it into an html table, or insert it into a new SQL table. Problem is it takes hours this method.
Any tips on how I can improve this process? Is there a way to write this as a single SQL query that will massively increase speed? Thanks
Just try a GROUP BY:
SELECT `group`, sum(sales)
FROM your_table
GROUP BY `group`
Edit to include back ticks for group. Without them you will receive an error
You should always avoid a SQL query in a loop unless there's no other solution. In this case, you can grab all the fields at once and have them in an array and add them up in PHP that way.
I have to select 4 rows randomly from a column.
Is is better to generate randomly 4 id and to perform 4 requests 'select column from database where id = ... '
Or to select all the rows in one request and to choose after?
If you are capable of generating random existing id's, I think the best approach is to use a clause like where id in (id1, id2, id3, id4). This will result in getting 4 records in one query, so no unnecessary query's or records are fetched.
As told before, where id in (id1, id2, id3, id4) is the fastest way from the MySQL perspective. How ever, you will need some logic in the application generating those IDs : All 4 IDs shall exist, be randomly distributed, and you want to avoid duplicates. In worst case you will be retrieving a list of all existend IDs with a huge query, extracting 4 random values, and querying again.
With all that logic to be done, it can be wise to move selection into MySQL:
SELECT * FROM foobar
ORDER BY RAND()
LIMIT 4;
You must understand that this is slow in mysql, but you have a speed gain in the application logic and can be sure to get random values equally seed all over your table.
EDIT:
The comment asks if PHP is fasten in this task then MySQL. Answer is no.
It is not done by "using rand". You need to have an array containing all those IDs in PHP. That is a huge query, lots of TCP traffic, huge array to be buildt in php, huge btree to be buildt by zend engine. Then, with the IDs, you must fire a second query to get the rows for those IDs.
Although the RAND() function may be slow, so far I have not had significant problems with speed. MY strategy is actually to join the database back to a query of itself returning a list of random IDs with a limit.
SELECT *
FROM table AS t1
JOIN (
SELECT rowID
FROM table
ORDER BY RAND()
LIMIT 4
) AS t2
WHERE t1.rowID = t2.rowID
There is also a more robust solution that exist - try checking out this question (asked in 2010).
I'm having a terrible time with a MySQL query. I've spent most of my weekend and most of my day today attempting to make this query run a bit faster. I've made it considerably faster, but I know I can make it better.
SELECT m.id,other_fields,C.contacts_count FROM marketingDatabase AS m
LEFT OUTER JOIN
(SELECT COUNT(*) as contacts_count, rid
FROM contacts
WHERE status = 'Active' AND install_id = 'XXXX' GROUP BY rid) as C
ON C.rid = m.id
WHERE (RAND()*2612<50)
AND do_not_call != 'true'
AND `ACTUAL SALES VOLUME` >= '800000'
AND `ACTUAL SALES VOLUME` <= '1200000'
AND status = 'Pending'
AND install_id = 'XXXXX'
ORDER BY RAND()
I have an index on 'install_id', 'category' and 'status' but the EXPLAIN shows it was sorting based on 9100 rows.
My Explain is here:
https://s3.amazonaws.com/jas-so-question/Screen+Shot+2012-03-13+at+12.34.04+AM.png
Anybody have any suggestions on what I can do to make this a bit faster? The entire point of the query is to select a random record from an account's records (install_id) that matches certain criteria like sales volume, status and do_not_call. I'm currently gathering 25 records and caching it (using PHP) so I only have to run this query once every 25 requests, but I'm already dealing with thousands of requests per day. It currently takes 0.2 seconds to run. I realize that by using ORDER BY RAND() I'm already taking a major performance hit, but it's just sorting 25 rows.
Thanks in advance for the help.
**EDIT: I forgot to mention that the 'contact_sort' index is on the 'contacts' table, and indexes install_id, status, and rid. (rid references Record ID in marketingDatabase so it knows which record a contact belongs to.
**EDIT 2: The 2612 number in the query represents the number of rows in marketingDatabase that match the criteria (install_id, status, actual sales volume, etc.)
Since I do not see your index definitions, I am not sure they are correct. The query would benefit from the following indexes:
a composite index (install_id, status, rid) on the contacts
a composite index (install_id, status, `ACTUAL SALES VOLUME`) on marketingDatabase
I played around with a few queries, and I don't think you'll ever be able to get a indexed query to work with RAND(), especially when you're using it in both a WHERE clause and an ORDER BY clause. If at all possible, I'd introduce the random element in my PHP logic, and probably look at whether two simple queries made more sense than one fairly complex one. Added to that, you have LEFT OUTER JOIN on a random result set, which may also be increasing the amount of work that has to be done a lot.
In summary, my guess would be - rewrite to exclude RAND, see if you can get rid of the LEFT OUTER JOIN. Two straightforward indexed queries with a bit of PHP in between may be a lot better.
so I have the following query:
SELECT DISTINCT d.iID1 as 'id',
SUM(d.sum + d.count*r.lp)/sum(d.count) AS avgrat
FROM abcd i, abce r, abcf d
WHERE r.aID = 1 AND
d.iID1 <> r.rID AND d.iID2 = r.rID GROUP BY d.iID1
ORDER BY avgrat LIMIT 50;
the problem is....with millions of entries in the table, SUM() and GROUP BY would freeze up the query....is there a way to do exactly this that would execute instantaneously using MySQL and/or PHP hacks (perhaps do the summing with PHP....but how would I go about doing that...)
To answer the direct question: no, there is no way to do anything instantaneously.
If you have control over the table updates, or the application which adds the relevant records, then you could add logic which updates another table with the sum, count, and id with each update. Then a revised query targets the "sum table" and trivially calculates the averages.
One solution is to create a rollup table that holds your aggregate values
using a triggers on your source tables to keep it up to date.
You will need to decide if the overhead of the triggers is less then that of the query.
some important factors are:
The frequency of the source table updates
The run frequency of the aggregate query.