MySQL SUM() giving incorrect total - php

I am developing a php/mysql database. I have two tables - matters and actions.
Amongst other fields the matter table contains 'matterid' 'fixedfee' and 'fee'. Fixed fee is Y or N and the fee can be any number.
For any matter there can be a number of actions. The actions table contains 'actionid' 'matterid' 'advicetime' 'advicefee'. The advicetime is how long the advice goes on for (in decimal format) and advicefee is a number. Thus, to work out the cost of the advice for a matter I use SUM(advicetime*advicefee).
What I wish to do is to add up all of the 'fee' values when 'fixedfee'=Y and also the sum of all of the SUM(advicetime*advicefee) values for all of these matters.
I have tried using:
SELECT
SUM(matters.fee) AS totfixed,
SUM(advicetime*advicefee) AS totbills,
FROM matters
INNER JOIN actions
ON matters.matterid = actions.matterid
WHERE fixedfee = 'Y'
but this doesn't work as (I think) it is adding up the matters.fee for every time there is an action. I have also tried making it
SUM(DISTINCT matters.fee) AS totfixed
but this doesn't work as I think it seems to be missing out any identical fees (and there are several matters which have the same fixed fee).
I am fairly new to this so any help would be very welcome.

but this doesn't work as (I think) it is adding up the matters.fee for every time there is an action. I have also tried making it ...
You're experiencing aggregate fanout issue. This happens whenever the primary table in a select query has fewer rows than a secondary table to which it is joined. The join results in duplicate rows. So, when aggregate functions are applied, they act on extra rows.
Here the primary table refers to the one where aggregate functions are applied. In your example,
* SUM(matters.fee) >> aggregation on table matters.
* SUM(advicetime*advicefee) >> aggregation on table actions
* fixedfee='Y' >> where condition on table matters
To avoid the fanout issue:
* Always apply the aggregates to the most granular table in a join.
* Unless two tables have a one-to-one relationship, don't apply aggregate functions on fields from both tables.
* Obtain your aggregates separately through different subqueries and then combine the result. This can be done in a SQL statement, or you can export the data and then do it.
Query 1:
SELECT SUM(fee) AS totfixed
FROM matters
WHERE fixedfee='Y'
Query 2:
SELECT SUM(actions.advicetime*actions.advicefee) AS totbills
FROM matters
JOIN actions ON matters.matterid = actions.matterid
WHERE matters.fixedfee = 'Y'
Query 1 & Query 2 don't suffer from fanout. At this point you can export them both and deal with the result in php. Or you can combine them in SQL:
SELECT query_2.totbills, query_1.totfixed
FROM (SELECT SUM(fee) AS totfixed
FROM matters
WHERE fixedfee='Y') query_1,
(SELECT SUM(actions.advicetime*actions.advicefee) AS totbills
FROM matters
JOIN actions ON matters.matterid = actions.matterid
WHERE matters.fixedfee = 'Y') query_2
Finally, SUM does not take a keyword DISTINCT. DISTINCT is only available to COUNT and GROUP_CONCAT aggregate functions. The following is a piece of invalid SQL
SUM(DISTINCT matters.fee) AS totfixed

Related

ROW wise SUM VS COLUMN wise SUM in MySQL

I have a tableA this contains the following structure
I modified this structure into tableB like below to reduce number of rows and the category is fixed length
Assume I have 21 lakh data in tableA after modified into new structure tableB contains 70k rows only
In some case I want to SUM all the values into the table,
QUERY1: SELECT SUM(val) AS total FROM tableA;
vs
QUERY2: SELECT SUM(cate1+cate2+cate3) AS total FROM tableB;
QUERY1 is executing faster while comparing to QUERY2.
tableB contains less rows while comparing to tableA
As of my expectation QUERY2 is faster but QUERY1 is the fastest one.
Help me to understand why the performance is reduced in QUERY2?
MySQL is optimized to speed up relational operations. There is not so much effort at speeding up the other kinds of operations MySQL can perform. Cate1+Cate2+Cate3 is a perfectly legitimate operation, but there's nothing particularly relational about it.
Table1 is actually simpler in terms of the relational model of data than Table2, even though Table1 has more rows. It's worth noting in passing that Table1 conforms to first normal form but Table2 does not. Those three columns are really a repeating group even though it's been made to look like they are not.
So First Normal form is good for you in terms of performance (most of the time).
In your first query, mysql just need to do the summation. (1 process)
In your second query, mysql first need an arithmetic addition along three columns , then do a summation through the results.(2 process).

Mysql search query

I am developing a car rental site. I have two tables test_tbl_cars and test_reservations.
I am using the search query (cribbed from Jon Kloske in "How do I approach this PHP/MYSQL query?"):
$sql = mysql_query("SELECT
test_tbl_cars.*,
SUM(rental_start_date <= '$ddate' AND rental_end_date >= '$adate') AS ExistingReservations
FROM test_tbl_cars
LEFT JOIN test_reservations USING (car_id)
GROUP BY car_id
HAVING ExistingReservations = 0");
This gives me excellent search results but the test_tbl_cars table contains many cars which in any given search returns several of the same car model as being available.
How can I filter the query return such that I get one of each model available?
Use Distict clause
$sql = mysql_query("SELECT
DISTINCT test_tbl_cars.model, test_tbl_cars.*,
SUM(rental_start_date <= '$ddate' AND rental_end_date >= '$adate') AS ExistingReservations
FROM test_tbl_cars
LEFT JOIN test_reservations USING (car_id)
GROUP BY car_id
HAVING ExistingReservations = 0");
Awww, should have tagged me, I only saw this now over a year later! You've probably already figured out how to work this by now, but I'll take a crack at it anyway for completeness sake and because most of the answers here I don't think are doing what you want.
So the problem you are having is that in the other question each room had a unique ID and it was unique rooms people were interested in booking. Here, you're extending the concept of a bookable item to a pool of items of a particular class (in this case, model of car).
There's may be a way to do this without subqueries but by far the easiest way to do it is to simply take the original idea from my other answer and extend it by wrapping it up in another query that does the grouping into models (and as you'll see shortly, we get a bunch of other useful stuff for free out of doing this).
So, firstly lets start by getting the list of cars with counts of conflicting reservations (as per the update to my other answer):
(I'll use your query for these examples as a starting point, but note you really should use prepared statements or at the very least escaping functions supplied by your DB driver for the two parameters you're passing)
SELECT car_id, model_id, SUM(IF(rental_id IS NULL, 0, rental_start_date <= '$ddate' AND rental_end_date >= '$adate')) AS ConflictingReservations
FROM test_tbl_cars
LEFT JOIN test_reservations USING (car_id)
GROUP BY car_id
This will return one row per car_id giving you the model number, and the number of reservations that conflict with the date range you've specified (0 or more).
Now at this stage if we were asking about individual cars (rather than just models of cars available) we could restrict and order the results with "HAVING ConflictingReservations = 0 ORDER BY model_id" or something.
But, if we want to get a list of the availability of ~models~, we need to perform a further grouping of these results to get the final answer:
SELECT model_id, COUNT(*) AS TotalCars, SUM(ConflictingReservations = 0) AS FreeCars, CAST(IFNULL(GROUP_CONCAT(IF(ConflictingReservations = 0, car_id, NULL) ORDER BY car_id ASC), '') AS CHAR) AS FreeCarsList
FROM (
SELECT car_id, model_id, SUM(IF(rental_id IS NULL, 0, rental_start_date <= '$ddate' AND rental_end_date >= '$adate')) AS ConflictingReservations
FROM test_tbl_cars
LEFT JOIN test_reservations USING (car_id)
GROUP BY car_id
) AS CarReservations
GROUP BY model_id
You'll notice all we're doing is grouping the original query by model_id, and then using aggregate functions to get us the model_id, a count of total cars we have of this model, a count of free cars of this model we have which we achieve by counting all the times a car has zero ConflictingReservations, and finally a cute little bit of SQL that returns a comma separated list of the car_ids of the free cars (in case that was also needed!)
A quick word on performance: all the left joins, group bys, and subqueries could make this query very slow indeed. The good news is the outer group by should only have to process as many rows as you have cars for, so it shouldn't be slow until you end up with a very large number of cars. The inner query however joins two tables (which can be done quite quickly with indexes) and then groups by the entire set, performing functions on each row. This could get quite slow, particularly as the number of reservations and cars increases. To alleviate this you could use where clauses on the inner query and combine that with appropriate indexes to reduce the number of items you are inspecting. There's also other tricks you can use to move the comparison of the start and end dates into the join condition, but that's a topic for another day :)
And finally, as always, if there's incorrect edge cases, mistakes, wrong syntax, whatever - let me know and I'll edit to correct!

MySQL Query Optimization - Random Record

I'm having a terrible time with a MySQL query. I've spent most of my weekend and most of my day today attempting to make this query run a bit faster. I've made it considerably faster, but I know I can make it better.
SELECT m.id,other_fields,C.contacts_count FROM marketingDatabase AS m
LEFT OUTER JOIN
(SELECT COUNT(*) as contacts_count, rid
FROM contacts
WHERE status = 'Active' AND install_id = 'XXXX' GROUP BY rid) as C
ON C.rid = m.id
WHERE (RAND()*2612<50)
AND do_not_call != 'true'
AND `ACTUAL SALES VOLUME` >= '800000'
AND `ACTUAL SALES VOLUME` <= '1200000'
AND status = 'Pending'
AND install_id = 'XXXXX'
ORDER BY RAND()
I have an index on 'install_id', 'category' and 'status' but the EXPLAIN shows it was sorting based on 9100 rows.
My Explain is here:
https://s3.amazonaws.com/jas-so-question/Screen+Shot+2012-03-13+at+12.34.04+AM.png
Anybody have any suggestions on what I can do to make this a bit faster? The entire point of the query is to select a random record from an account's records (install_id) that matches certain criteria like sales volume, status and do_not_call. I'm currently gathering 25 records and caching it (using PHP) so I only have to run this query once every 25 requests, but I'm already dealing with thousands of requests per day. It currently takes 0.2 seconds to run. I realize that by using ORDER BY RAND() I'm already taking a major performance hit, but it's just sorting 25 rows.
Thanks in advance for the help.
**EDIT: I forgot to mention that the 'contact_sort' index is on the 'contacts' table, and indexes install_id, status, and rid. (rid references Record ID in marketingDatabase so it knows which record a contact belongs to.
**EDIT 2: The 2612 number in the query represents the number of rows in marketingDatabase that match the criteria (install_id, status, actual sales volume, etc.)
Since I do not see your index definitions, I am not sure they are correct. The query would benefit from the following indexes:
a composite index (install_id, status, rid) on the contacts
a composite index (install_id, status, `ACTUAL SALES VOLUME`) on marketingDatabase
I played around with a few queries, and I don't think you'll ever be able to get a indexed query to work with RAND(), especially when you're using it in both a WHERE clause and an ORDER BY clause. If at all possible, I'd introduce the random element in my PHP logic, and probably look at whether two simple queries made more sense than one fairly complex one. Added to that, you have LEFT OUTER JOIN on a random result set, which may also be increasing the amount of work that has to be done a lot.
In summary, my guess would be - rewrite to exclude RAND, see if you can get rid of the LEFT OUTER JOIN. Two straightforward indexed queries with a bit of PHP in between may be a lot better.

trick to get the sum of MySQL rows outside of MySQL

so I have the following query:
SELECT DISTINCT d.iID1 as 'id',
SUM(d.sum + d.count*r.lp)/sum(d.count) AS avgrat
FROM abcd i, abce r, abcf d
WHERE r.aID = 1 AND
d.iID1 <> r.rID AND d.iID2 = r.rID GROUP BY d.iID1
ORDER BY avgrat LIMIT 50;
the problem is....with millions of entries in the table, SUM() and GROUP BY would freeze up the query....is there a way to do exactly this that would execute instantaneously using MySQL and/or PHP hacks (perhaps do the summing with PHP....but how would I go about doing that...)
To answer the direct question: no, there is no way to do anything instantaneously.
If you have control over the table updates, or the application which adds the relevant records, then you could add logic which updates another table with the sum, count, and id with each update. Then a revised query targets the "sum table" and trivially calculates the averages.
One solution is to create a rollup table that holds your aggregate values
using a triggers on your source tables to keep it up to date.
You will need to decide if the overhead of the triggers is less then that of the query.
some important factors are:
The frequency of the source table updates
The run frequency of the aggregate query.

MYSQL query optimization

I'm trying to optimize a report query run on an ecommerce site. I'm pretty sure that I'm doing something stupid, since this query shouldn't be taking nearly as long to run as it does.
The query in question is:
SELECT inventories_name, inventories_code, SUM(shop_orders_inventories_qty) AS qty,
SUM(shop_orders_inventories_price) AS tot_price, inventories_categories_name,
inventories_price_list, inventories_id
FROM shop_orders
LEFT JOIN shop_orders_inventories ON (shop_orders_id = join_shop_orders_id)
LEFT JOIN inventories ON (join_inventories_id = inventories_id)
WHERE {$date_type} BETWEEN '{$start_date}' AND '{$end_date}'
AND shop_orders_x_response_code = 1
GROUP BY join_inventories_id, join_shop_categories_id
{$order}
{$limit}
It's basically trying to get total sales per item over a period of time; values in curly brackets are filled in via a form. It works fine for a period of a couple days, but querying a time interval of a week or more can take 30 seconds+.
I feel like it's joining way too many rows in order to calculate the aggregate values and sucking up huge amounts of memory, but I'm not sure how to limit it.
Note - I realize that I'm selecting fields which aren't in the group by, but they correspond 1-1 with inventory ID, which is in the group by.
Any suggestions?
-- Edit --
The current indices are:
inventories:
join_categories - BTREE
inventories_name, inventories_code, inventories_description - FULLTEXT
shop_orders_inventories:
shop_orders_inventories_id - BTREE
shop_orders:
shop_orders_id - BTREE
Two sequential left joins will work quite long on a big table. Try to use "join" instead of "left join" (unless you have records in shop_orders with now matching records in shop_orders_inventories or inventories) or split this query to couple of small ones. Also by using "sum" and "group by" you are forcing MySQL to create temp tables - you might want to increase MySQL cache so those tables would fit in to memory (otherwise MySQL will dump them to disk which will also increase SQL execution time).
The first and foremost rule to indexing is... index the columns that you will search on!
For each possible value of {$date_type}, create an index for that date column.
Once you have lots of data in the table (say 2 years or 100 weeks), a single week's data is 1% of the index, so it becomes a good starting point.
Even though MySQL allows non-aggregates in the SELECT clause, I personally would sync the two
SELECT inventories_name, inventories_code,
SUM(shop_orders_inventories_qty) AS qty,
SUM(shop_orders_inventories_price) AS tot_price,
inventories_categories_name, inventories_price_list, inventories_id
FROM ...
GROUP BY inventories_id, join_shop_categories_id, inventories_name,
inventories_code, inventories_categories_name, inventories_price_list
...

Categories