Optimizing the SQL Query to get data from large amount MySQL database - php

I am having a problem getting data from a large amount MySQL database.
With the below code it is ok to get the list of 10K patients and 5K appointments which is our test server.
However, on our live server, the number of patients is over 100K and the number of appointments is over 300K and when I run the code after a while it gives 500 error.
I need the list of the patients whose patient_treatment_status is 1 or 3 and has no appointment after one month from their last appointment. (The below code is working for small amount of patients and appointments)
How can I optimise the first database query so there will be no need the second database query in the foreach loop?
<?php
ini_set('memory_limit', '-1');
ini_set('max_execution_time', 0);
require_once('Db.class.php');
$patients = $db->query("
SELECT
p.id, p.first_name, p.last_name, p.phone, p.mobile,
LatestApp.lastAppDate
FROM
patients p
LEFT JOIN (SELECT patient_id, MAX(start_date) AS lastAppDate FROM appointments WHERE appointment_status = 4) LatestApp ON p.id = LatestApp.patient_id
WHERE
p.patient_treatment_status = 1 OR p.patient_treatment_status = 3
ORDER BY
p.id
");
foreach ($patients as $row) {
$one_month_after_the_last_appointment = date('Y-m-d', strtotime($row['lastAppDate'] . " +1 month"));
$appointment_check = $db->single("SELECT COUNT(id) FROM appointments WHERE patient_id = :pid AND appointment_status = :a0 AND (start_date >= :a1 AND start_date <= :a2)", array("pid"=>"{$row['id']}","a0"=>"1","a1"=>"{$row['lastAppDate']}","a2"=>"$one_month_after_the_last_appointment"));
if($appointment_check == 0){
echo $patient_id = $row['id'].' - '.$row['lastAppDate'].' - '.$one_month_after_the_last_appointment. '<br>';
}
}
?>

First off, this subquery likely does not do what you think it does.
SELECT patient_id, MAX(start_date) AS lastAppDate
FROM appointments WHERE appointment_status = 4
Without a GROUP BY clause, that subquery will simply take the maximum start_date of all appointments with appointment_status=4, and then arbitrarily pick one patient_id. To get the results you want you'll need to GROUP BY patient_id.
For your overall question, try the following query:
SELECT
p.id, p.first_name, p.last_name, p.phone, p.mobile,
LatestApp.lastAppDate
FROM
patients p
INNER JOIN (
SELECT patient_id,
MAX(start_date) AS lastAppDate
FROM appointments
WHERE appointment_status = 4
GROUP BY patient_id
) LatestApp ON p.id = LatestApp.patient_id
WHERE
(p.patient_treatment_status = 1
OR p.patient_treatment_status = 3)
AND NOT EXISTS (
SELECT 1
FROM appointments a
WHERE a.patient_id = p.patient_id
AND a.appointment_status = 1
AND a.start_date >= LatestApp.lastAppDate
AND a.start_date < DATE_ADD(LatestApp.lastAppDate,INTERVAL 1 MONTH)
)
ORDER BY
p.id
Add the following index, if it doesn't already exist:
ALTER TABLE appointments
ADD INDEX (`patient_id`, `appointment_status`, `start_date`)
Report how this performs and if the data appears correct. Provide SHOW CREATE TABLE patient and SHOW CREATE TABLE appointments for further assistance related to performance.
Also, try the query above without the AND NOT EXISTS clause, together with the second query you use. It is possible that running 2 queries may be faster than trying to run them together, in this situation.
Note that I used an INNER JOIN to find the latest appointment. This will result in all patients that have never had an appointment to not be included in the query. If you need those added, just UNION the results those found by selecting from patients that have never had an appointment.

Related

Sum columns on different tables and multiply by value of a column on another table

I need to compute employees' monthly salaries based on meetings attended, deductions and bonuses given;
Employees have different pay per meeting based on their job position.
The solution is:
salary = (Pay_per_minute * meetings_attended) + bonuses - deductions ;
I have four tables:
Jobs: Id, title, pay_per_meeting
Employees: Id, Name, job_id
Bonuses: Id, amount, employee_id, date
Deductions: Id, amount, employee_id, date
Meetings: Id, employee_id, date
SELECT
COUNT(meetings.employee_id) as meetings_attended,
COUNT(deductions.amount) as debt,
COUNT(bonuses.amount) bonus,
(SELECT jobs.pay_per_attendance from jobs where jobs.id = (select job_id from employees where id=meetings.employee_id)) as pay,
((meetings_attended * pay) + bonus - debt) as salary
FROM meetings
JOIN deductions ON deductions.employee_id = meetings.employee_id
JOIN bonuses ON bonuses.employee_id = meetings.employee_id
WHERE meetings.employee_id = 1
GROUP BY MONTH(meetings.date), MONTH(deductions.date), MONTH(bonuses.date)
The above query returns many incorrect values whenever i remove the salary line but gives error of unknown column pay, meetings_attended, debt and bonus, am sure something is wrong with the grouping but i can't just see it.
You can't refer to column aliases in the same select list as they're defined, you need to refer to the underlying column. And a subquery can't access an aggregate calculated in the main query. You need to repeat the aggregate expression, or move everything into a subquery and do the calculation with it in an outer query.
Also, all your COUNT() expressions are going to return the same thing, since they're just counting rows (I assume none of the values can be NULL). You probably want COUNT(DISTINCT <column>) to get different counts, and you need to use a column that's unique, so they should be the primary key column, e.g. COUNT(DISTINCT deductions.id).
Another problem is that when you try to sum and count values when you have multiple joins, you end up with a result that's too high, because rows get duplicated in the cross product of all the tables. See Join tables with SUM issue in MYSQL. The solution is to calculate the sums from each table in subqueries.
SELECT m.month, m.meetings_attended, d.debt, b.bonus,
m.meetings_attended * j.pay_per_meeting + b.amount - d.amount AS salary
FROM (
SELECT MONTH(date) AS month, COUNT(*) AS meetings_attended
FROM meetings
WHERE employee_id = 1
GROUP BY month) AS m
JOIN (
SELECT MONTH(date) AS month, COUNT(*) AS bonus, SUM(amount) AS amount
FROM bonuses
WHERE employee_id = 1
GROUP BY month) AS b ON m.month = b.month
JOIN (
SELECT MONTH(date) AS month, COUNT(*) AS debt, SUM(amount) AS amount
FROM deductions
WHERE employee_id = 1
GROUP BY month) AS d ON m.month = d.month
CROSS JOIN employees AS e
JOIN jobs AS j ON j.id = e.job_id
WHERE e.employee_id = 1

How can I Sum / Group By the results of this table

I have a table of hours which looks like :
I want to sum the hours_spent results for this week only and group the results by the created_by person. I have this query which returns the correct data for showing only results in this week :
SELECT staff_id, first_name, last_name, date_entered, `hours_spent` as total_hours FROM hours LEFT JOIN staff ON hours.created_by = staff.staff_id where yearweek(`date_entered`) = yearweek(curdate());
But when I add the SUM(hours_spent) as total_hours and group by staff_id like the example below I get 0 results.
SELECT staff_id, date_entered, first_name, last_name, SUM(`hours_spent`) as total_hours FROM hours LEFT JOIN staff ON hours.created_by = staff.staff_id group by staff_id having yearweek(`date_entered`) = yearweek(curdate());
I'm assuming it's not working because the Having part of my statement doesn't return individual rows of dates so it breaks.
I feel like I am doing this the hard way. Should I be trying to run a second summing query on the results of the first query rather than combine it all into one (I was hoping for cleanliness). Or should I be using a subquery to filter out the dates that aren't this week then group the totals if so how could I accomplish this?
I got what I was expecting with :
SELECT staff.first_name,staff.last_name, sum(hours_spent)
FROM hours
LEFT JOIN staff ON hours.created_by = staff.staff_id
WHERE yearweek(date_entered,1) = (yearweek(curdate(),1)-1)
GROUP BY created_by

Join a query into another query with column computation

I have three tables named issue_details, nature_payments, and rci_records. Now I have this query which joins this three tables.
SELECT issue_details.issue_date AS Date,
issue_details.check_no AS Check_No,
payees.payee_name AS Name_payee,
nature_payments.nature_payment AS Nature_of_Payment,
issue_details.issue_amount AS Checks_issued,
issue_details.nca_balance AS Nca_balance
FROM
issue_details
INNER JOIN
nature_payments ON
issue_details.nature_id = nature_payments.nature_id
INNER JOIN
payees ON
issue_details.payee_id = payees.payee_id
ORDER BY Date Asc, Check_no ASC
On my column in Nca_balance, this is a computed differences of every issuances of check. But you may not know what really the process of how I got the difference but to make it simple, let's say that I have another query
that dynamically get also the difference of this nca_balance column. Here is the query:
SELECT r.*,
(#tot := #tot - issue_amount) as bank_balance
FROM (SELECT #tot := SUM(nca_amount) as nca_total FROM nca
WHERE account_type = 'DBP-TRUST' AND
year(issue_date) = year('2015-01-11') AND
month(issue_date) = month('2015-01-11')
)
vars CROSS JOIN issue_details r
WHERE r.account_type = 'DBP-TRUST' AND
r.issue_date = '2015-01-11'
ORDER BY r.issue_date, r.check_no
I know it you may not get my point but I just want to replace the first query of the line
issue_details.nca_balance AS Nca_balance
with my own computation on my second query.
Please help me combine those two query into a single query. Thanks

SQL command to get the count from an average (ie what is n)

Two tables:
1. stories, one column lists over 10,000 story titles with other columns including author, date, category etc. 'id'is the column that is a unique identifyer (auto incrememnting for each story)
2. ratings. This table records star ranking for each of the stories. So this table, has 3 columns, a auto incrementing unique id, the id from table number 1 (which in table 2 is called storyidr)/, the rank value.
So i would like to report the average rating and the total number of ratings for each story.
I've used sql JOIN and I can get the average rating to report fine.
SELECT s.*,
ROUND(AVG(r.rank),0)
AS avrank
FROM stories s
LEFT JOIN ratings
AS r
ON r.storyidr = s.id
GROUP BY s.id
ORDER BY RAND()
LIMIT 200;";
Getting the count is another story. i'm trying COUNT and UNION. Nothing is working. Is there a way to 'extract' the 'value of n' from the average value that is already being queried?
knowing that average=(sum/n)
I don't have to do it this way. If i could add additional SQL queries to the current one to get the count that would be just fine. I'm just not seeing how to add the count function to the current script?
With suggestions:
$query="SELECT s.*, COUNT(r.rank) AS rkct ROUND(AVG(r.rank),0) AS avrank FROM stories s LEFT JOIN ratings AS r ON r.storyidr = s.id GROUP BY s.id ORDER BY RAND() LIMIT 5;";$result=mysqli_query($connection,$query);
<?php while ($data = mysqli_fetch_assoc($result)):$id = $data['id'];$author = $data['author'];$email = $data['email'];$title = $data['title'];$img_link = $data['img_link'];$smaller_img = $data['smaller_img'];$story_link = $data['story_link'];$page_path = $data['page_path'];$tag_alt = $data['tag_alt'];$category = $data['category'];$avgrate = $data['avrank'];$rankcount = $data['rkct'];
The suggestions are giving me the same error: Warning: mysqli_fetch_assoc() expects parameter 1 to be mysqli_result, boolean given in intro.php on line 88.
this is line 88: $avgratep = "Avg rating: " . $avgrate . "/5";
seems like adding the count is making the avrank value nothing or non-numeric?
Just call the count function in the same way you call avg:
SELECT s.*,
ROUND(AVG(r.rank),0) AS avrank,
COUNT(*) AS countrank
FROM stories s
LEFT JOIN ratings
AS r
ON r.storyidr = s.id
GROUP BY s.id
ORDER BY RAND()
LIMIT 200;

MySQL Need to show 0 when no value in right hand table of join, with cumulative

I have one table with a list of number of sales per month against product code and another with a list of months that can extend before or after the months that had a sale in. I need to results to show 0 sales if there were no sales in the month and for the cumulative to add this up. I have tried using case and if and getting it to put 0 if sales.sales was null but this did not work and I still just had blanks.
create table summary as (SELECT
q1.productid As productid,
q1.date AS Month_View,
q1.sales AS Monthly_Units_Sold,
(#runtot_sales := #runtot_sales + q1.sales) AS Cumulative_Sales
FROM
(SELECT
sales.productid,
dates.date,
if(sales.date is null,0,sales.sales) as sales
from
dates
left join sales on dates.date = sales.date
where
sales.productid = '$input1'
group by dates.date
ORDER BY date) AS q1);
";
Try COALESCE() function to return the first non-NULL value of a list Also see demo here
CREATE TABLE summary AS
(SELECT
q1.productid AS productid,
q1.date AS Month_View,
q1.sales AS Monthly_Units_Sold,
(
#runtot_sales := #runtot_sales + q1.sales
) AS Cumulative_Sales
FROM
(SELECT
sales.productid,
dates.date,
COALESCE(sales.sales, 0) AS sales
FROM
dates
LEFT JOIN sales
ON dates.date = sales.date
WHERE sales.productid = '$input1'
GROUP BY dates.date
ORDER BY DATE) AS q1) ;
MySQL COALESCE() function
You are misusing GROUP BY and therefore getting indeterminate results. See this: http://dev.mysql.com/doc/refman/5.5/en/group-by-extensions.html
If you're aggregating your items by product and date you probably want something like this.
SELECT sales.productid,
dates.date,
SUM(sales.sales) as sales
FROM dates
LEFT JOIN sales ON dates.date = sales.date
WHERE sales.productid = '$input1'
GROUP BY sales.productid, dates.date
ORDER BY /* i'm not sure what you're trying to do with the running total */
Note that SUM(sales.sales) handles the NULL values from your LEFT JOIN correctly. If the date doesn't join a sales row then sales.sales will be NULL.
If you're trying to do a month-by-month summary you need more logic than you have. See this writeup: http://www.plumislandmedia.net/mysql/sql-reporting-time-intervals/

Categories