trick to get the sum of MySQL rows outside of MySQL - php

so I have the following query:
SELECT DISTINCT d.iID1 as 'id',
SUM(d.sum + d.count*r.lp)/sum(d.count) AS avgrat
FROM abcd i, abce r, abcf d
WHERE r.aID = 1 AND
d.iID1 <> r.rID AND d.iID2 = r.rID GROUP BY d.iID1
ORDER BY avgrat LIMIT 50;
the problem is....with millions of entries in the table, SUM() and GROUP BY would freeze up the query....is there a way to do exactly this that would execute instantaneously using MySQL and/or PHP hacks (perhaps do the summing with PHP....but how would I go about doing that...)

To answer the direct question: no, there is no way to do anything instantaneously.
If you have control over the table updates, or the application which adds the relevant records, then you could add logic which updates another table with the sum, count, and id with each update. Then a revised query targets the "sum table" and trivially calculates the averages.

One solution is to create a rollup table that holds your aggregate values
using a triggers on your source tables to keep it up to date.
You will need to decide if the overhead of the triggers is less then that of the query.
some important factors are:
The frequency of the source table updates
The run frequency of the aggregate query.

Related

Mysql fetch from last to first - [many records]

i want to fetch records from mysql starting from last to first LIMIT 20. my database have over 1M records. I am aware of order by. but from my understanding when using order by its taking forever to load 20 records i have no freaking idea. but i think mysql fetch all the records before ordering.
SELECT bookings.created_at, bookings.total_amount,
passengers.name, passengers.id_number, payments.amount,
passengers.ticket_no,bookings.phone,bookings.source,
bookings.destination,bookings.date_of_travel FROM bookings
INNER JOIN passengers ON bookings.booking_id = passengers.booking_id
INNER JOIN payments on payments.booking_id = bookings.booking_id
ORDER BY bookings.booking_id DESC LIMIT 10
I suppose if you execute the query without the order by the time would be satisfactory?
You might try to create an index in the column your are ordering:
create index idx_bookings_booking_id on bookings(booking_id)
You can try to find out complexity of the Query using
EXPLAIN SELECT bookings.created_at, bookings.total_amount,
passengers.name, passengers.id_number, payments.amount,
passengers.ticket_no,bookings.phone,bookings.source,
bookings.destination,bookings.date_of_travel FROM bookings
INNER JOIN passengers ON bookings.booking_id = passengers.booking_id
INNER JOIN payments on payments.booking_id = bookings.booking_id
ORDER BY bookings.booking_id DESC LIMIT 10
then check the proper index has been created on the table
SHOW INDEX FROM `db_name`.`table_name`;
if the index us not there create proper index on all the table
please add if anything is missing
The index lookup table needs to be able to reside in memory, if I'm not mistaken (filesort is much slower than in-mem lookup).
Use small index / column size
For a double in capacity use UNSIGNED columns if you need no negative values..
Tune sort_buffer_size and read_rnd_buffer_size (maybe better on connection level, not global)
See https://dev.mysql.com/doc/refman/5.7/en/order-by-optimization.html , particularly regarding using EXPLAIN and the maybe trying another execution plan strategy.
You seem to need another workaround like materialized views.
Tell me if this sounds like it:
Create another table like the booking table e.g. CREATE TABLE booking_short LIKE booking. Though you only need the booking_id column
And check your code for where exactly you create booking orders, e.g. where you first insert into booking. SELECT COUNT(*) FROM booking_short. If it is >20, delete the first record. Insert the new booking_id.
You can select the ID and join from there before joining for more details with the rest of the tables.
You won't need limit or sorting.
Of course, this needs heavy documentation to avoid maintenance problems.
Either that or https://stackoverflow.com/a/5912827/6288442

Quicker way to sum sections of an SQL column

Say I have a table with 1million rows. One column lists the "Group", and another lists "Sales". The Group #'s range from 1 to 100,000 such that each Group has about 10 Sales entries. I want to somehow summarize the data into 100,000 rows with the sum of Sales for each group rather than each individual sale.
My method so far has been to run a PHP loop from 1 to 100,000 where each iteration sends an SQL query to sum(Sales) WHERE Group=$i. Then I can either echo it into an html table, or insert it into a new SQL table. Problem is it takes hours this method.
Any tips on how I can improve this process? Is there a way to write this as a single SQL query that will massively increase speed? Thanks
Just try a GROUP BY:
SELECT `group`, sum(sales)
FROM your_table
GROUP BY `group`
Edit to include back ticks for group. Without them you will receive an error
You should always avoid a SQL query in a loop unless there's no other solution. In this case, you can grab all the fields at once and have them in an array and add them up in PHP that way.

MySQL SUM() giving incorrect total

I am developing a php/mysql database. I have two tables - matters and actions.
Amongst other fields the matter table contains 'matterid' 'fixedfee' and 'fee'. Fixed fee is Y or N and the fee can be any number.
For any matter there can be a number of actions. The actions table contains 'actionid' 'matterid' 'advicetime' 'advicefee'. The advicetime is how long the advice goes on for (in decimal format) and advicefee is a number. Thus, to work out the cost of the advice for a matter I use SUM(advicetime*advicefee).
What I wish to do is to add up all of the 'fee' values when 'fixedfee'=Y and also the sum of all of the SUM(advicetime*advicefee) values for all of these matters.
I have tried using:
SELECT
SUM(matters.fee) AS totfixed,
SUM(advicetime*advicefee) AS totbills,
FROM matters
INNER JOIN actions
ON matters.matterid = actions.matterid
WHERE fixedfee = 'Y'
but this doesn't work as (I think) it is adding up the matters.fee for every time there is an action. I have also tried making it
SUM(DISTINCT matters.fee) AS totfixed
but this doesn't work as I think it seems to be missing out any identical fees (and there are several matters which have the same fixed fee).
I am fairly new to this so any help would be very welcome.
but this doesn't work as (I think) it is adding up the matters.fee for every time there is an action. I have also tried making it ...
You're experiencing aggregate fanout issue. This happens whenever the primary table in a select query has fewer rows than a secondary table to which it is joined. The join results in duplicate rows. So, when aggregate functions are applied, they act on extra rows.
Here the primary table refers to the one where aggregate functions are applied. In your example,
* SUM(matters.fee) >> aggregation on table matters.
* SUM(advicetime*advicefee) >> aggregation on table actions
* fixedfee='Y' >> where condition on table matters
To avoid the fanout issue:
* Always apply the aggregates to the most granular table in a join.
* Unless two tables have a one-to-one relationship, don't apply aggregate functions on fields from both tables.
* Obtain your aggregates separately through different subqueries and then combine the result. This can be done in a SQL statement, or you can export the data and then do it.
Query 1:
SELECT SUM(fee) AS totfixed
FROM matters
WHERE fixedfee='Y'
Query 2:
SELECT SUM(actions.advicetime*actions.advicefee) AS totbills
FROM matters
JOIN actions ON matters.matterid = actions.matterid
WHERE matters.fixedfee = 'Y'
Query 1 & Query 2 don't suffer from fanout. At this point you can export them both and deal with the result in php. Or you can combine them in SQL:
SELECT query_2.totbills, query_1.totfixed
FROM (SELECT SUM(fee) AS totfixed
FROM matters
WHERE fixedfee='Y') query_1,
(SELECT SUM(actions.advicetime*actions.advicefee) AS totbills
FROM matters
JOIN actions ON matters.matterid = actions.matterid
WHERE matters.fixedfee = 'Y') query_2
Finally, SUM does not take a keyword DISTINCT. DISTINCT is only available to COUNT and GROUP_CONCAT aggregate functions. The following is a piece of invalid SQL
SUM(DISTINCT matters.fee) AS totfixed

MySQL Query Optimization - Random Record

I'm having a terrible time with a MySQL query. I've spent most of my weekend and most of my day today attempting to make this query run a bit faster. I've made it considerably faster, but I know I can make it better.
SELECT m.id,other_fields,C.contacts_count FROM marketingDatabase AS m
LEFT OUTER JOIN
(SELECT COUNT(*) as contacts_count, rid
FROM contacts
WHERE status = 'Active' AND install_id = 'XXXX' GROUP BY rid) as C
ON C.rid = m.id
WHERE (RAND()*2612<50)
AND do_not_call != 'true'
AND `ACTUAL SALES VOLUME` >= '800000'
AND `ACTUAL SALES VOLUME` <= '1200000'
AND status = 'Pending'
AND install_id = 'XXXXX'
ORDER BY RAND()
I have an index on 'install_id', 'category' and 'status' but the EXPLAIN shows it was sorting based on 9100 rows.
My Explain is here:
https://s3.amazonaws.com/jas-so-question/Screen+Shot+2012-03-13+at+12.34.04+AM.png
Anybody have any suggestions on what I can do to make this a bit faster? The entire point of the query is to select a random record from an account's records (install_id) that matches certain criteria like sales volume, status and do_not_call. I'm currently gathering 25 records and caching it (using PHP) so I only have to run this query once every 25 requests, but I'm already dealing with thousands of requests per day. It currently takes 0.2 seconds to run. I realize that by using ORDER BY RAND() I'm already taking a major performance hit, but it's just sorting 25 rows.
Thanks in advance for the help.
**EDIT: I forgot to mention that the 'contact_sort' index is on the 'contacts' table, and indexes install_id, status, and rid. (rid references Record ID in marketingDatabase so it knows which record a contact belongs to.
**EDIT 2: The 2612 number in the query represents the number of rows in marketingDatabase that match the criteria (install_id, status, actual sales volume, etc.)
Since I do not see your index definitions, I am not sure they are correct. The query would benefit from the following indexes:
a composite index (install_id, status, rid) on the contacts
a composite index (install_id, status, `ACTUAL SALES VOLUME`) on marketingDatabase
I played around with a few queries, and I don't think you'll ever be able to get a indexed query to work with RAND(), especially when you're using it in both a WHERE clause and an ORDER BY clause. If at all possible, I'd introduce the random element in my PHP logic, and probably look at whether two simple queries made more sense than one fairly complex one. Added to that, you have LEFT OUTER JOIN on a random result set, which may also be increasing the amount of work that has to be done a lot.
In summary, my guess would be - rewrite to exclude RAND, see if you can get rid of the LEFT OUTER JOIN. Two straightforward indexed queries with a bit of PHP in between may be a lot better.

MYSQL query optimization

I'm trying to optimize a report query run on an ecommerce site. I'm pretty sure that I'm doing something stupid, since this query shouldn't be taking nearly as long to run as it does.
The query in question is:
SELECT inventories_name, inventories_code, SUM(shop_orders_inventories_qty) AS qty,
SUM(shop_orders_inventories_price) AS tot_price, inventories_categories_name,
inventories_price_list, inventories_id
FROM shop_orders
LEFT JOIN shop_orders_inventories ON (shop_orders_id = join_shop_orders_id)
LEFT JOIN inventories ON (join_inventories_id = inventories_id)
WHERE {$date_type} BETWEEN '{$start_date}' AND '{$end_date}'
AND shop_orders_x_response_code = 1
GROUP BY join_inventories_id, join_shop_categories_id
{$order}
{$limit}
It's basically trying to get total sales per item over a period of time; values in curly brackets are filled in via a form. It works fine for a period of a couple days, but querying a time interval of a week or more can take 30 seconds+.
I feel like it's joining way too many rows in order to calculate the aggregate values and sucking up huge amounts of memory, but I'm not sure how to limit it.
Note - I realize that I'm selecting fields which aren't in the group by, but they correspond 1-1 with inventory ID, which is in the group by.
Any suggestions?
-- Edit --
The current indices are:
inventories:
join_categories - BTREE
inventories_name, inventories_code, inventories_description - FULLTEXT
shop_orders_inventories:
shop_orders_inventories_id - BTREE
shop_orders:
shop_orders_id - BTREE
Two sequential left joins will work quite long on a big table. Try to use "join" instead of "left join" (unless you have records in shop_orders with now matching records in shop_orders_inventories or inventories) or split this query to couple of small ones. Also by using "sum" and "group by" you are forcing MySQL to create temp tables - you might want to increase MySQL cache so those tables would fit in to memory (otherwise MySQL will dump them to disk which will also increase SQL execution time).
The first and foremost rule to indexing is... index the columns that you will search on!
For each possible value of {$date_type}, create an index for that date column.
Once you have lots of data in the table (say 2 years or 100 weeks), a single week's data is 1% of the index, so it becomes a good starting point.
Even though MySQL allows non-aggregates in the SELECT clause, I personally would sync the two
SELECT inventories_name, inventories_code,
SUM(shop_orders_inventories_qty) AS qty,
SUM(shop_orders_inventories_price) AS tot_price,
inventories_categories_name, inventories_price_list, inventories_id
FROM ...
GROUP BY inventories_id, join_shop_categories_id, inventories_name,
inventories_code, inventories_categories_name, inventories_price_list
...

Categories