Quicker way to sum sections of an SQL column - php

Say I have a table with 1million rows. One column lists the "Group", and another lists "Sales". The Group #'s range from 1 to 100,000 such that each Group has about 10 Sales entries. I want to somehow summarize the data into 100,000 rows with the sum of Sales for each group rather than each individual sale.
My method so far has been to run a PHP loop from 1 to 100,000 where each iteration sends an SQL query to sum(Sales) WHERE Group=$i. Then I can either echo it into an html table, or insert it into a new SQL table. Problem is it takes hours this method.
Any tips on how I can improve this process? Is there a way to write this as a single SQL query that will massively increase speed? Thanks

Just try a GROUP BY:
SELECT `group`, sum(sales)
FROM your_table
GROUP BY `group`
Edit to include back ticks for group. Without them you will receive an error

You should always avoid a SQL query in a loop unless there's no other solution. In this case, you can grab all the fields at once and have them in an array and add them up in PHP that way.

Related

SQL select all files where a value in table A is the same as in table B (same database)

I'm building a sales system for a company, but I'm stuck with the following issue.
Every day I load .XML productfeed into a database called items. The rows in the productfeed are never in the same order, so sometimes the row with Referentie = 380083 is at the very top, and the other day that very same row is at the very bottum.
I also have to get all the instock value, but when I run the following query
SELECT `instock` FROM SomeTable WHERE `id` > 0
I get all values, but not in the same order as in the other table.
So I have to get the instock value of all rows where referentie in table A is the same as it is in table B.
I already have this query:
select * from `16-11-23 wed 09:37` where `referentie` LIKE '4210310AS'
and this query does the right job, but I have like 500 rows in the table.
So I need to find a way to automate the: LIKE '4210310AS' bit, so it selects all 500 values in one go.
Can somebody tell me how that can be done?
I'm not even sure I understand your problem...
Don't take this personally, but you seem to be concerned/confused by the ordering of the data in the tables which suggests to me your understanding of relational databases and SQL is lacking. I suggest you brush up on the basics.
Can't you just use the following query?
SELECT a.referentie
, b.instock
FROM tableA a
, tableB b
WHERE b.referentie = a.referentie

Multiple 'Select column where ...' vs Select all the column

I have to select 4 rows randomly from a column.
Is is better to generate randomly 4 id and to perform 4 requests 'select column from database where id = ... '
Or to select all the rows in one request and to choose after?
If you are capable of generating random existing id's, I think the best approach is to use a clause like where id in (id1, id2, id3, id4). This will result in getting 4 records in one query, so no unnecessary query's or records are fetched.
As told before, where id in (id1, id2, id3, id4) is the fastest way from the MySQL perspective. How ever, you will need some logic in the application generating those IDs : All 4 IDs shall exist, be randomly distributed, and you want to avoid duplicates. In worst case you will be retrieving a list of all existend IDs with a huge query, extracting 4 random values, and querying again.
With all that logic to be done, it can be wise to move selection into MySQL:
SELECT * FROM foobar
ORDER BY RAND()
LIMIT 4;
You must understand that this is slow in mysql, but you have a speed gain in the application logic and can be sure to get random values equally seed all over your table.
EDIT:
The comment asks if PHP is fasten in this task then MySQL. Answer is no.
It is not done by "using rand". You need to have an array containing all those IDs in PHP. That is a huge query, lots of TCP traffic, huge array to be buildt in php, huge btree to be buildt by zend engine. Then, with the IDs, you must fire a second query to get the rows for those IDs.
Although the RAND() function may be slow, so far I have not had significant problems with speed. MY strategy is actually to join the database back to a query of itself returning a list of random IDs with a limit.
SELECT *
FROM table AS t1
JOIN (
SELECT rowID
FROM table
ORDER BY RAND()
LIMIT 4
) AS t2
WHERE t1.rowID = t2.rowID
There is also a more robust solution that exist - try checking out this question (asked in 2010).

MySQL Query Optimization - Random Record

I'm having a terrible time with a MySQL query. I've spent most of my weekend and most of my day today attempting to make this query run a bit faster. I've made it considerably faster, but I know I can make it better.
SELECT m.id,other_fields,C.contacts_count FROM marketingDatabase AS m
LEFT OUTER JOIN
(SELECT COUNT(*) as contacts_count, rid
FROM contacts
WHERE status = 'Active' AND install_id = 'XXXX' GROUP BY rid) as C
ON C.rid = m.id
WHERE (RAND()*2612<50)
AND do_not_call != 'true'
AND `ACTUAL SALES VOLUME` >= '800000'
AND `ACTUAL SALES VOLUME` <= '1200000'
AND status = 'Pending'
AND install_id = 'XXXXX'
ORDER BY RAND()
I have an index on 'install_id', 'category' and 'status' but the EXPLAIN shows it was sorting based on 9100 rows.
My Explain is here:
https://s3.amazonaws.com/jas-so-question/Screen+Shot+2012-03-13+at+12.34.04+AM.png
Anybody have any suggestions on what I can do to make this a bit faster? The entire point of the query is to select a random record from an account's records (install_id) that matches certain criteria like sales volume, status and do_not_call. I'm currently gathering 25 records and caching it (using PHP) so I only have to run this query once every 25 requests, but I'm already dealing with thousands of requests per day. It currently takes 0.2 seconds to run. I realize that by using ORDER BY RAND() I'm already taking a major performance hit, but it's just sorting 25 rows.
Thanks in advance for the help.
**EDIT: I forgot to mention that the 'contact_sort' index is on the 'contacts' table, and indexes install_id, status, and rid. (rid references Record ID in marketingDatabase so it knows which record a contact belongs to.
**EDIT 2: The 2612 number in the query represents the number of rows in marketingDatabase that match the criteria (install_id, status, actual sales volume, etc.)
Since I do not see your index definitions, I am not sure they are correct. The query would benefit from the following indexes:
a composite index (install_id, status, rid) on the contacts
a composite index (install_id, status, `ACTUAL SALES VOLUME`) on marketingDatabase
I played around with a few queries, and I don't think you'll ever be able to get a indexed query to work with RAND(), especially when you're using it in both a WHERE clause and an ORDER BY clause. If at all possible, I'd introduce the random element in my PHP logic, and probably look at whether two simple queries made more sense than one fairly complex one. Added to that, you have LEFT OUTER JOIN on a random result set, which may also be increasing the amount of work that has to be done a lot.
In summary, my guess would be - rewrite to exclude RAND, see if you can get rid of the LEFT OUTER JOIN. Two straightforward indexed queries with a bit of PHP in between may be a lot better.

trick to get the sum of MySQL rows outside of MySQL

so I have the following query:
SELECT DISTINCT d.iID1 as 'id',
SUM(d.sum + d.count*r.lp)/sum(d.count) AS avgrat
FROM abcd i, abce r, abcf d
WHERE r.aID = 1 AND
d.iID1 <> r.rID AND d.iID2 = r.rID GROUP BY d.iID1
ORDER BY avgrat LIMIT 50;
the problem is....with millions of entries in the table, SUM() and GROUP BY would freeze up the query....is there a way to do exactly this that would execute instantaneously using MySQL and/or PHP hacks (perhaps do the summing with PHP....but how would I go about doing that...)
To answer the direct question: no, there is no way to do anything instantaneously.
If you have control over the table updates, or the application which adds the relevant records, then you could add logic which updates another table with the sum, count, and id with each update. Then a revised query targets the "sum table" and trivially calculates the averages.
One solution is to create a rollup table that holds your aggregate values
using a triggers on your source tables to keep it up to date.
You will need to decide if the overhead of the triggers is less then that of the query.
some important factors are:
The frequency of the source table updates
The run frequency of the aggregate query.

MYSQL query optimization

I'm trying to optimize a report query run on an ecommerce site. I'm pretty sure that I'm doing something stupid, since this query shouldn't be taking nearly as long to run as it does.
The query in question is:
SELECT inventories_name, inventories_code, SUM(shop_orders_inventories_qty) AS qty,
SUM(shop_orders_inventories_price) AS tot_price, inventories_categories_name,
inventories_price_list, inventories_id
FROM shop_orders
LEFT JOIN shop_orders_inventories ON (shop_orders_id = join_shop_orders_id)
LEFT JOIN inventories ON (join_inventories_id = inventories_id)
WHERE {$date_type} BETWEEN '{$start_date}' AND '{$end_date}'
AND shop_orders_x_response_code = 1
GROUP BY join_inventories_id, join_shop_categories_id
{$order}
{$limit}
It's basically trying to get total sales per item over a period of time; values in curly brackets are filled in via a form. It works fine for a period of a couple days, but querying a time interval of a week or more can take 30 seconds+.
I feel like it's joining way too many rows in order to calculate the aggregate values and sucking up huge amounts of memory, but I'm not sure how to limit it.
Note - I realize that I'm selecting fields which aren't in the group by, but they correspond 1-1 with inventory ID, which is in the group by.
Any suggestions?
-- Edit --
The current indices are:
inventories:
join_categories - BTREE
inventories_name, inventories_code, inventories_description - FULLTEXT
shop_orders_inventories:
shop_orders_inventories_id - BTREE
shop_orders:
shop_orders_id - BTREE
Two sequential left joins will work quite long on a big table. Try to use "join" instead of "left join" (unless you have records in shop_orders with now matching records in shop_orders_inventories or inventories) or split this query to couple of small ones. Also by using "sum" and "group by" you are forcing MySQL to create temp tables - you might want to increase MySQL cache so those tables would fit in to memory (otherwise MySQL will dump them to disk which will also increase SQL execution time).
The first and foremost rule to indexing is... index the columns that you will search on!
For each possible value of {$date_type}, create an index for that date column.
Once you have lots of data in the table (say 2 years or 100 weeks), a single week's data is 1% of the index, so it becomes a good starting point.
Even though MySQL allows non-aggregates in the SELECT clause, I personally would sync the two
SELECT inventories_name, inventories_code,
SUM(shop_orders_inventories_qty) AS qty,
SUM(shop_orders_inventories_price) AS tot_price,
inventories_categories_name, inventories_price_list, inventories_id
FROM ...
GROUP BY inventories_id, join_shop_categories_id, inventories_name,
inventories_code, inventories_categories_name, inventories_price_list
...

Categories