MySQL time interval query overloading server - php

I am generating a CSV of how many orders I have every month for the last 12 months per item. At the moment I am retrieving the items and using a foreach loop to submit the following query for each item, in order to retrieve the total times it has been ordered in each month over the past 12 months. However, the query takes just over 3 seconds, so when the loop goes a couple of thousand times, it causes the MySQL server to go away.
How could I optimise this query? Should the load on the database be reduced if I use a sub query instead?
Here is the query(which was adapted from here):
SELECT order_item_variant_alias_id, DATE(DATE_FORMAT(order_progress_time, '%Y-%m-01')) AS `trueMonth`, COUNT(*) AS count
FROM tbl_order_progress
JOIN tbl_order_items ON order_progress_order_id = order_item_order_id
JOIN tbl_product_variant_aliases ON order_item_variant_alias_id = product_variant_alias_id
JOIN tbl_product_variants ON product_variant_id = product_variant_alias_variant_id
GROUP BY product_variant_alias_id, DATE(DATE_FORMAT(order_progress_time, '%Y-%m-01'))
HAVING order_item_variant_alias_id = 1

Why are you running a separate query for each item? Just drop the having clause and put all items in at the same time.
If, for some reason, you do need to do one item at a time, switch the logic to a where clause rather than a having clause. The having clause will aggregate all items and then filter down to the one you want. MySQL should be faster if you reduce the data first -- using where:
SELECT order_item_variant_alias_id,
DATE(DATE_FORMAT(order_progress_time, '%Y-%m-01')) AS `trueMonth`,
COUNT(*) AS count
FROM tbl_order_progress JOIN
tbl_order_items
ON order_progress_order_id = order_item_order_id JOIN
tbl_product_variant_aliases
ON order_item_variant_alias_id = product_variant_alias_id JOIN
tbl_product_variants
ON product_variant_id = product_variant_alias_variant_id
WHERE order_item_variant_alias_id = 1
GROUP BY product_variant_alias_id,
DATE(DATE_FORMAT(order_progress_time, '%Y-%m-01'));

Related

SQL sum and add two columns from second query

I'm trying to combine two bits of logic into one SQL query to speed up the effectiveness of my codebase, I currently have the two queries like so (pseudo code):
// Looping each user in table
$statement = "SELECT id FROM users";
// I then loop that statement, combining data from sub-query
{
// Get the "summing" data from table in reference to looped user
$second_statement = "SELECT add_col_1,add_col_2FROM users WHERE ref = id"
// Combine add_col_1, and add_col_2 together
array_sum($second_statement)
}
What I am after is the sum of that second statement, getting the ID from the first query SELECT id FROM users then researching that table for a reference to that, under the column name ref where I can then add together columns from each of those rows.
I'm not doing this so that I can get one statement instead of two, it's cause once the app scales, having to query thousands of users exponentially is just not good.
The statement I attempted to unite these two loops is:
SELECT
id,
(SUM(SELECT col_add_1+col_add_2 FROM users WHERE ref = a.id)) AS total
FROM users a
Got an execution error.
nearly there, you just need to join to itself to link up all the items in one call then group the results by id so the sum works on all the linked items.
SELECT
a.id,
SUM(b.col_add_1 + b.coll_add_2)
FROM
users a
LEFT JOIN
users b
ON
a.id = b.ref
GROUP BY
a.id
You have got an error because you are performing aggregation without involving any group by clause. So, you should re-write your query as
SELECT id,
(SELECT SUM(col_add_1+col_add_2) FROM users WHERE ref = a.id) AS total
FROM users a
GROUP BY id;

MySql : Increase query execution time for pagination

I'm working on a project which have a large set of data. The data are stored in a MySql Db.
I want to fetch records with pagination from few tables. One of the table is having over 2 millions of records which is causing the page to freeze for a very long time. Also sometimes the page is not able to load at all.
The query I'm using to fetch the records :
SELECT
EN.`MRN`,
P.`FNAME`,
P.`LNAME`,
P.`MI`,
P.`SSC`,
sum(EN.`AMOUNT`) AS `TOTAL_AMOUNT`
FROM `table_1` AS EN
INNER JOIN `table_2` AS P ON EN.`MRN` = P.`MRN`
GROUP BY EN.`MRN`,P.`FNAME`, P.`LNAME`,P.`MI`,P.`SSC`
HAVING sum(EN.`AMOUNT`) > 0
ORDER BY P.`LNAME`
By this query I'm getting the total number of records for the pagination to work. Then I again run this query to get the actual records :
SELECT
EN.`MRN`,
P.`FNAME`,
P.`LNAME`,
P.`MI`,
P.`SSC`,
sum(EN.`AMOUNT`) AS `TOTAL_AMOUNT`
FROM `table_1` AS EN
INNER JOIN `table_2` AS P ON EN.`MRN` = P.`MRN`
GROUP BY EN.`MRN`,P.`FNAME`, P.`LNAME`,P.`MI`,P.`SSC`
HAVING sum(EN.`AMOUNT`) > 0
ORDER BY P.`LNAME`
LIMIT 0, 100
How can I make this query to work faster. Because it takes a very long time execute the query for the first time to get total number of records.
It is better to separate total_amount from the query because that is the only value involving all the records. You can call this when you load the page. I assume that all your records in table_1 is valid.
SELECT sum(EN.`AMOUNT`) AS `TOTAL_AMOUNT` FROM `table_1` AS EN
HAVING sum(EN.`AMOUNT`) > 0
Then get the query result every time when you flip the page. This should only return 10 records starting from record 0.
SELECT
EN.`MRN`,
P.`FNAME`,
P.`LNAME`,
P.`MI`,
P.`SSC`
FROM `table_1` AS EN
INNER JOIN `table_2` AS P ON EN.`MRN` = P.`MRN`
GROUP BY EN.`MRN`,P.`FNAME`, P.`LNAME`,P.`MI`,P.`SSC`
HAVING sum(EN.`AMOUNT`) > 0
ORDER BY P.`LNAME`
LIMIT 10 OFFSET 0
Hope this helps.
There are a few things you can do to make this faster:
Use EXPLAIN to make sure your indices are set correctly / set the correct indices.
Only execute the complicated query once using SQL_CALC_FOUND_ROWS.
Use the TOTAL_AMOUNT alias in your HAVING statement to avoid doing the calculation twice.
MySQL will do some optimizations and caching itself so they might not all have the same impact.

Displaying a large amount of data in paging table without heavily impacting DB

The current implementation is a single complex query with multiple joins and temporary tables, but is putting too much stress on my MySQL and is taking upwards of 30+ seconds to load the table. The data is retrieved by PHP via a JavaScript Ajax call and displayed on a webpage. Here is the tables involved:
Table: table_companies
Columns: company_id, ...
Table: table_manufacture_line
Columns: line_id, line_name, ...
Table: table_product_stereo
Columns: product_id, line_id, company_id, assembly_datetime, serial_number, ...
Table: table_product_television
Columns: product_id, line_id, company_id, assembly_datetime, serial_number, warranty_expiry, ...
A single company can have 100k+ items split between the two product tables. The product tables are unioned and filtered by the line_name, then ordered by assembly_datetime and limited depending on the paging. The datetime value is also reliant on timezone and this is applied as part of the query (another JOIN + temp table). line_name is also one of the returned columns.
I was thinking of splitting the line_name filter out from the product union query. Essentially I'd determine the ids of the lines that correspond to the filter, then do a UNION query with a WHERE condition WHERE line_id IN (<results from previous query>). This would cut out the need for joins and temp tables, and I can apply the line_name to line_id and timezone modification in PHP, but I'm not sure this is the best way to go about things.
I have also looked at potentially using Redis, but the large number of individual products is leading to a similarly long wait time when pushing all of the data to Redis via PHP (20-30 seconds), even if it is just pulled in directly from the product tables.
Is it possible to tweak the existing queries to increase the efficiency?
Can I push some of the handling to PHP to decrease the load on the SQL server? What about Redis?
Is there a way to architect the tables better?
What other solution(s) would you suggest?
I appreciate any input you can provide.
Edit:
Existing query:
SELECT line_name,CONVERT_TZ(datetime,'UTC',timezone) datetime,... FROM (SELECT line_name,datetime,... FROM ((SELECT line_id,assembly_datetime datetime,... FROM table_product_stereos WHERE company_id=# ) UNION (SELECT line_id,assembly_datetime datetime,... FROM table_product_televisions WHERE company_id=# )) AS union_products INNER JOIN table_manufacture_line USING (line_id)) AS products INNER JOIN (SELECT timezone FROM table_companies WHERE company_id=# ) AS tz ORDER BY datetime DESC LIMIT 0,100
Here it is formatted for some readability.
SELECT line_name,CONVERT_TZ(datetime,'UTC',tz.timezone) datetime,...
FROM (SELECT line_name,datetime,...
FROM (SELECT line_id,assembly_datetime datetime,...
FROM table_product_stereos WHERE company_id=#
UNION
SELECT line_id,assembly_datetime datetime,...
FROM table_product_televisions
WHERE company_id=#
) AS union_products
INNER JOIN table_manufacture_line USING (line_id)
) AS products
INNER JOIN (SELECT timezone
FROM table_companies
WHERE company_id=#
) AS tz
ORDER BY datetime DESC LIMIT 0,100
IDs are indexed; Primary keys are the first key for each column.
Let's build this query up from its component parts to see what we can optimize.
Observation: you're fetching the 100 most recent rows from the union of two large product tables.
So, let's start by trying to optimize the subqueries fetching stuff from the product tables. Here is one of them.
SELECT line_id,assembly_datetime datetime,...
FROM table_product_stereos
WHERE company_id=#
But look, you only need the 100 newest entries here. So, let's add
ORDER BY assembly_datetime DESC
LIMIT 100
to this query. Also, you should put a compound index on this table as follows. This will allow both the WHERE and ORDER BY lookups to be satisfied by the index.
CREATE INDEX id_date ON table_product_stereos (company_id, assembly_datetime)
All the same considerations apply to the query from table_product_televisions. Order it by the time, limit it to 100, and index it.
If you need to apply other selection criteria, you can put them in these inner queries. For example, in a comment you mentioned a selection based on a substring search. You could do this as follows
SELECT t.line_id,t.assembly_datetime datetime,...
FROM table_product_stereos AS t
JOIN table_manufacture_line AS m ON m.line_id = t.line_id
AND m.line_name LIKE '%test'
WHERE company_id=#
ORDER BY assembly_datetime DESC
LIMIT 100
Next, you are using UNION to combine those two query result sets into one. UNION has the function of eliminating duplicates, which is time-consuming. (You know you don't have duplicates, but MySQL doesn't.) Use UNION ALL instead.
Putting this all together, the innermost sub query becomes this. We have to wrap up the subqueries because SQL is confused by UNION and ORDER BY clauses at the same query level.
SELECT * FROM (
SELECT line_id,assembly_datetime datetime,...
FROM table_product_stereos
WHERE company_id=#
ORDER BY assembly_datetime DESC
LIMIT 100
) AS st
UNION ALL
SELECT * FROM (
SELECT line_id,assembly_datetime datetime,...
FROM table_product_televisions
WHERE company_id=#
ORDER BY assembly_datetime DESC
LIMIT 100
) AS tv
That gets you 200 rows. It should get those rows fairly quickly.
200 rows are guaranteed to be enough to give you the 100 most recent items later on after you do your outer ORDER BY ... LIMIT operation. But that operation only has to crunch 200 rows, not 100K+, so it will be far faster.
Finally wrap up this query in your outer query material. Join the table_manufacture_line information, and fix up the timezone.
If you do the indexing and the ORDER BY ... LIMIT operation earlier, this query should become very fast.
The comment dialog in your question indicates to me that you may have multiple product types, not just two, and that you have complex selection criteria for your paged display. Using UNION ALL on large numbers of rows slams performance: it converts multiple indexed tables into an internal list of rows that simply can't be searched efficiently.
You really should consider putting your two kinds of product data in a single table instead of having to UNION ALL multiple product tables. The setup you have now is inflexible and won't scale up easily. If you structure your schema with a master product table and perhaps some attribute tables for product-specific information, you will find yourself much happier two years from now. Seriously. Please consider making the change.
Remember: Index fast, data slow. Use joins over nested queries. Nested queries return all of the data fields whereas joins just consider the filters (which should all be indexed - make sure there's a unique index on table_product_*.line_id). It's been a while but I'm pretty sure you can join "ON company_id=#" which should cut down the results early on.
In this case, all of the results refer to the same company (or a much smaller subset) so it makes sense to run that query separately (and it makes the query more maintainable).
So your data source would be:
(table_product_stereos as prod
INNER JOIN table_manufacture_line AS ml ON prod.line_id = ml.line_id and prod.company_id=#
UNION
table_product_televisions as prod
INNER JOIN table_manufacture_line as ml on prod.line_id = ml.line_id and prod.company_id=#)
From which you can select prod. or ml. fields as required.
PHP is not a solution at all...
Redis can be a solution.
But the main thing I would change is the index creation for the tables (add missing indexe)...If you're running into temp tables you didn't create indexes well for the tables. And 100k rows in not much at all.
But I cant help you without any table creation statements as well as queries you run.
Make sure your "where part" is part of youf btree index from left to right.

mysql calculate percentage between two sub queries

I am trying to work out the percentage of a number of students who meet certain criteria.
I have 3 separate tables that I need to get data from, and then I need to get the total from one table (student) as the total of students.
Then I need to use this total, to divide the COUNT of the no of students in the 2nd query.
So basically I am trying to get a count of ALL the students that are in the DB first.
Then count the no of students that appear in my main query (the one returning the data).
Then I need to perform the calculation that will take the noOfStudents (2) and divide by the main total (24) (no of students in DB) then *100 to give me the percentage of students who have met the criteria in the main query.
This is what I have so far:
SELECT * FROM (
(
SELECT s.firstname, s.lastname, s.RegistrationDate, s.Email, d.ReviewDate,(r.description) AS "Viva" , COUNT(*) AS "No of Students"
FROM student s
INNER JOIN dates d
ON s.id=d.student_identifier
INNER JOIN reviews r
ON d.review_Identifier=r.id
WHERE r.description = "Viva Date"
GROUP BY s.student_identifier
ORDER BY s.student_identifier)
) AS Completed
WHERE Completed.ReviewDate BETWEEN '2012-01-01' AND '2014-12-01'
;
I need to output the fields following the second SELECT and this data in turn will be displayed via PHP/HTML code on a page (the BETWEEN dates will be sent via '%s').
I wondered if I should be using 2 separate queries and then getting the value (24) from the first query to perform the calculation in the second query, but I have not been able to work out how to save as 2 separate queries and then reference the first query.
I am also not sure if it is possible to display an overall % total at the same time as outputting the individual rows that meet the criteria?
I am trying to teach myself SQL, so I apologise if I have made any glaring mistakes/assumptions in any of the above, and would appreciate any advice that's out there.
Thank you.
Could you do this?
SELECT COUNT(*) as TotalPopulation,
COUNT(d.student_identifier='student') as TotalStudents,
COUNT(d.student_identifier='student')/ count(*) *100 as Percentage of students
from students s
inner join dates d
on s.id = d.student_identifier
inner join reviews r
on r.id = d.review_Identifier
WHERE d.ReviewDate BETWEEN '2012-01-01' AND '2014-12-01' and r.description = 'Viva Date';
You do not need first name last name if you are just looking for counts, necessarily.
This get's the count(*) of table, then whatever flag you use to identify a student in the second count(), you just had it grouped by before, which could give you wrong results considering there's much else in your select before aggregation.
You could also try:
SELECT d.student_identifier, s.firstname, s.lastname,
s.RegistrationDate, s.Email, d.ReviewDate,(r.description) AS "Viva"
FROM student s
INNER JOIN dates d
ON s.id=d.student_identifier
INNER JOIN reviews r
ON d.review_Identifier=r.id
WHERE r.description = "Viva Date" and d.ReviewDate BETWEEN '2012-01-01' AND '2014-12-01'
ORDER BY s.student_identifier
Now, if you want to return a list, that's the second one, if you want to return a count, you would use the first query and adjust to your student_identifier.

A MySQL query, some normalized tables, issues w/ counts from the normalized table rows (from PHP)

Update: Added Schema to the bottom...
I have a table of contracts: tbl_contract
And a table of users associated with the contract: tbl_contract2user
There can be any number of entries in tbl_contract2user, in which an entry existing means that the relationship exists (along with a pending column where 1 = pending and 0 = approved).
My goal here is to select all contracts where there is 1 (or more) active users within the time frame specified (see below).
The problem I'm having is the ability to sort out these contracts properly. The date range is working fine... For some reason I'm having trouble understanding when the number of users is 1 or more...(vs. 0) and yes - I'll be working with that data set (After the query).
See below for the start of the query...
$result = mysql_query("SELECT tbl_contract.id
FROM tbl_contract
LEFT JOIN tbl_contract2user ON tbl_contract.id = tbl_contract2user.contractID
WHERE tbl_contract2user.pending = 0
AND tbl_contract.startDate <= {$billing['start_time']}
AND tbl_contract.endDate >= {$billing['end_time']}");
Schema:
tbl_contract: id, startDate, endDate, value, name, dateCreated
tbl_contract2user: id, contractID, userID, pending
What is the actual problem?
Do you get all records instead of only those with a related user? If yes, turn the LEFT JOIN into a INNER JOIN and all contracts without a relation are gone...
The real issue is that if I have 6 users in one contract, I get 6 rows
returned instead of ONE row for that contract
This is exactly what a JOIN does. It takes all records from the left side and joins them with the records on the right side by using a specific condition. If you only want to know how many users a contract has, you can you a GROUP BY clause and a COUNT(*):
SELECT tbl_contract.id, COUNT(*) AS userCount
FROM tbl_contract
LEFT JOIN tbl_contract2user ON tbl_contract.id = tbl_contract2user.contractID
WHERE tbl_contract2user.pending = 0
AND tbl_contract.startDate <= {$billing['start_time']}
AND tbl_contract.endDate >= {$billing['end_time']}
GROUP BY tbl_contract.id
If you need more information about the user, you really need all these 6 rows...

Categories