I have two tables: Exam (ExamID, Date, Modality) and CT(ctdivol, ExamID(FK)) with the attributes in brackets.
Note: CT table has about 100 000 entries.
I want to calculate the average of ctdivol in a specific interval of dates.
I have this code that works but is too slow:
function get_CTDIvolAVG($min, $max) {
$values = 0;
$number = 0;
$query = "SELECT (unix_timestamp(date)*1000), examID
from exam use index(dates)
where modality = 'CT'
AND (unix_timestamp(date)*1000) between '" . $min . "' AND '" . $max . "';";
$result = mysql_query($query) or die('Query failed: ' . mysql_error());
while($line = mysql_fetch_array($result, MYSQL_ASSOC)) {
$avg = "SELECT SUM(ctdivol_mGy), count(ctdivol_mGy)
from ct use index(ctd)
where examID ='" . $line["examID"] ."'
AND ctdivol_mGy>0;";
$result1 = mysql_query($avg) or die('Query failed: ' . mysql_error());
while ($ct = mysql_fetch_array($result1, MYSQL_ASSOC)) {
$values = $values + floatval($ct["SUM(ctdivol_mGy)"]);
$number = $number + floatval($ct["count(ctdivol_mGy)"]);
}
}
if ($number!=0) {
echo $values/$number;
}
}
How can I make it faster?
Use EXPLAIN to see the query execution plan.
For that first query, MySQL can't make effective use of a index range scan operation. That expression in the WHERE clause has to be evaluated for every row in the table. We get better performance when we do the comparison to a bare column. Do the manipulation on the literal side... get those values converted to the datatype of the column you're comparing to.
WHERE e.date BETWEEN expr1 AND expr2
For expr1, you need an expression that converts your $min value into a datetime. Just be careful of timezone conversions. I think this might do what you need for expr1:
FROM_UNIXTIME( $min /1000)
Something like:
WHERE e.date BETWEEN FROM_UNIXTIME( $min /1000) AND FROM_UNIXTIME( $max /1000)
Then we should see MySQL able to make effective use of an index with leading column of date. The EXPLAIN output should show range for the access type.
If the number of columns being returned is a small subset, consider a covering index. Then the EXPLAIN will show "Using index", which means the query can be satisfied entirely from the index, with no lookups to pages in the underlying table.
Secondly, avoid running queries multiple times in a loop. It is usually more efficient to run a single query that returns a single resultset, because of the overhead of sending the SQL to the database, that database parsing the SQL text, for valid syntax (keywords in the right places), valid semantics (identifiers reference valid objects), considering possible access paths and determining which is lowest cost, then executing the query plan, obtaining metadata locks, generating the resultset, returning that to the client, and then cleaning up. It's not noticeable for a single statement, but when you start running a lot of statements in a tight loop, it starts to add up. Couple that with an inefficient query, and it starts to get really noticeable.
IF examID column in exam is unique and not null (or its the PRIMARY KEY of exam, then it looks like you could use a single query, like this:
SELECT UNIX_TIMESTAMP(e.date)*1000 AS `date_ts`
, e.examID AS `examID`
, SUM(ct.ctdivol_mGy) AS `SUM(ctdivol_mGy)`
, COUNT(ct.ctdivol_mGy) AS `count(ctdivol_mGy)`
FROM exam e
LEFT
JOIN ct
ON ct.examid = e.examID
AND ct.ctdivol_mGy > 0
WHERE e.modality = 'CT'
AND e.date >= FROM_UNIXTIME( $min /1000)
AND e.date <= FROM_UNIXTIME( $max /1000)
GROUP
BY e.modality
, e.date
, e.examID
ORDER
BY e.modality
, e.date
, e.examID
For best performance of that, you'd want covering indexes:
... ON exam (modality, date, examID)
... ON ct (examID, ctdivol_mGy)
We'd want to see the EXPLAIN output; we'd expect that MySQL could make use of the index on exam to do the GROUP BY (and avoiding a "Using filesort" operation), and also make use of a ref operation on the index to ct.
To reiterate... that query requires that examID be the PRIMARY KEY of the exam table (or at least be guaranteed to be unique and non-null). Otherwise, the result from that can be different than the original code. Absent that gurantee, we could use either an inline view, or subqueries in the SELECT list. But in terms of performance, we don't want to go there without good reason to.
That's just some general ideas, not a hard and fast "this will be faster".
You can write a join on the first table to a subquery table by exam_id:
$query = "SELECT (unix_timestamp(date)*1000) as time_calculation, ed.examID, inner_ct.inner_sum, inner_ct.inner_count "
" FROM exam ed,"
. " ( SELECT SUM(ctdivol_mGy) as inner_sum, count(ctdivol_mGy) as inner_count, examID"
. " FROM ct"
. " WHERE ctdivol_mGy>0 ) inner_ct"
. " WHERE ed.modality = 'CT' AND time_calculation between"
. " '$min' and '$max'"
. " AND ed.examId = inner_ct.examID";
The ( SELECT . . .) inner_ct creates an in memory table you can join from. Useful if you're selecting composed data (sums in your case) across a join.
Conversely, you can use the following syntax:
$query = "SELECT (unix_timestamp(date)*1000) as time_calculation, ed.examID, inner_ct.inner_sum, inner_ct.inner_count "
" FROM exam ed,"
. " LEFT JOIN ( SELECT SUM(ctdivol_mGy) as inner_sum, count(ctdivol_mGy) as inner_count, examID"
. " FROM ct"
. " WHERE ctdivol_mGy>0 ) inner_ct"
. " ON ed.examID = inner_ct.examID"
. " WHERE ed.modality = 'CT' AND time_calculation between"
. " '$min' and '$max'";
You have not provided sample data in the question so we resort to assumptions in an attempt to answer. If there is only one exam row for many rows in ct - but an exam row can exist that has no ct rows at all - then this single query should provide the results required.
SELECT
exam.examID
, (unix_timestamp(exam.date) * 1000
, SUM(ct.ctdivol_mGy)
, COUNT(ct.ctdivol_mGy)
FROM exam
LEFT OUTER JOIN ct on exam.examID = ct.examID AND ct.ctdivol_mGy > 0
WHERE exam.modality = 'CT'
AND exam.date >= #min AND exam.date < #max
GROUP BY
exam.examID
, (unix_timestamp(exam.date) * 1000)
;
Note I am not attempting the PHP code, just concentrating on the SQL. I have used #min and #max to indicate the 2 dates required in the where clause. These should be of the same data type as the column exam.date so do those calculations in PHP before adding into the query string.
I want to calculate the average of ctdivol in a specific interval of
dates.
If you are trying to return a single figure, then this should help:
SELECT
AVG(ct.ctdivol_mGy)
FROM exam
INNER JOIN ct on exam.examID = ct.examID AND ct.ctdivol_mGy > 0
WHERE exam.modality = 'CT'
AND exam.date >= #min AND exam.date < #max
;
Note for this variant we probably don't need a left join (but again due to a lack of sample data and expected result that is an assumption).
Related
I have a SELECT statement that pulls a limited number of items based on the value of one of the fields. (ie ORDER BY rate LIMIT 15).
However, I need to do some comparisons that and change the value of rate, and subsequently could alter the results that I want.
I could pull everything (without the LIMIT), alter the rate, re-sort, and then just process the number that I need. However, I don't know if it's possible to alter values in a php result array. I'm using:
$query_raw = "SELECT dl.dragon_list_id, dl.dragon_id, dl.dragon_name, dl.dragon_level, d.type, d.opposite, d.image, dr.dragon_earn_rate
FROM dragon_list dl
LEFT JOIN dragons d ON d.dragon_id = dl.dragon_id
LEFT JOIN dragon_rates dr ON dr.dragon_id = dl.dragon_id
AND dr.dragon_level = dl.dragon_level
WHERE dl.dragon_id IN (
SELECT dragon_id
FROM dragon_elements
WHERE element_id = 3
)
AND dl.dragon_list_id NOT IN (
SELECT dh.dragon_list_id
FROM dragon_to_habitat dh, dragon_list dl
WHERE dl.user_id = 1
AND dh.dragon_list_id = dl.dragon_list_id
AND dl.is_deleted = 0
)
AND dl.user_id = " . $userid . "
AND dl.is_deleted = 0
ORDER BY dr.dragon_earn_rate DESC, dl.dragon_name
LIMIT 15;";
$query = mysqli_query($link, $query_raw);
if (!$query) {
echo "DB Error, could not query the database\n";
echo 'MySQL Error: ' . mysqli_error($link);
exit;
}
$d = mysqli_fetch_array($d_query);
Well, after a lot of research and some trial and error I found my answers....
Yes, I CAN alter the result rows using something like:
$result['field'] = $newvalue;
I also learned I could reset the pointer by using:
mysqli_data_seek($d_query,0);
However, when I reset the counter, I lost the changes I made. So ultimately, I'm still a little stuck, but individually I had the answers.
I am sure this is possible but I think it maybe just very complex to write. I want to search every field by:
='SearchTerm'
then
Like %SearchTerm
then
like SearchTerm%
and finally
like %SearchTerm%. I want to run this on every field in my table which there is around 30 or 40. Is there an easy way to run this over multiple fields or will I have to declare every single one?
I think I have seen a query before where different matches between %query %query% etc are ranked by assigning an integer value and then ordering by this. Would that be possible on a query like this?
Any advice and help in the right direction is much appreciated.
You should use fulltext indexing on the fields you want searched and use MATCH AGAINST instead of LIKE %%. It's much faster and returns results based on relevancy. More info here:
Mysql match...against vs. simple like "%term%"
I do something very similar to what you're describing (in php and mysql)
Here's my code:
$search = trim($_GET["search"]);
$searches = explode(" ",$search);
$sql = "SELECT *,wordmatch+descmatch+usagematch+bymatch as `match` FROM (SELECT id,word,LEFT(description,100)as description,
IFNULL((SELECT sum(vote)
FROM vote v
WHERE v.definition_id = d.id),0) as votecount,
";
$sqlword = "";
$sqldesc = "";
$sqlusage = "";
$sqlby = "";
foreach ($searches as $value) {
$value = mysqli_real_escape_string($con,$value);
$sqlword = $sqlword . "+ IFNULL(ROUND((LENGTH(word) - LENGTH(REPLACE(UPPER(word), UPPER('$value'), '')))/LENGTH('$value')),0)";
$sqldesc = $sqldesc . "+ IFNULL(ROUND((LENGTH(description) - LENGTH(REPLACE(UPPER(description), UPPER('$value'), '')))/LENGTH('$value')),0)";
$sqlusage = $sqlusage . "+ IFNULL(ROUND((LENGTH(`usage`) - LENGTH(REPLACE(UPPER(`usage`), UPPER('$value'), '')))/LENGTH('$value')),0)";
$sqlby = $sqlby . "+ IFNULL(ROUND((LENGTH(`by`) - LENGTH(REPLACE(UPPER(`by`), UPPER('$value'), '')))/LENGTH('$value')),0)";
}
$sql = $sql . $sqlword ." as wordmatch,"
. $sqldesc ." as descmatch,"
. $sqlusage ." as usagematch,"
. $sqlby ." as bymatch
FROM definition d
HAVING (wordmatch > 0 OR descmatch > 0 OR usagematch > 0 OR bymatch > 0)
ORDER BY
wordmatch DESC,
descmatch DESC,
usagematch DESC,
bymatch DESC,
votecount DESC)T1";
$queries[] = $sql;
$result = mysqli_query($con,$sql);
You can see this at work http://unurbandictionary.comule.com/view_search.php?search=George+Miley+Cyrus this is when I search for "George Miley Cyrus"
What it does is it explodes the search string to find each word and returns the number of occurences of each word in each of my column, and then i do an ORDER BY to have relevance (priority) to come back first. So in my case word field has the highest relevance, then description field, then usage field, then by field.
Before this version of my code I was using LIKE but it didn't give me a count of occurences, since I want the row with the most occurences of my search word to return first before other rows.
You should really have some sort of id to select the rows in your table.
You should have put a column with
id INT NOT NULL AUTO_INCREMENT PRIMARY KEY
Then you could use
SELECT * FROM table WHERE column1 LIKE "%SearchTerm%" AND id BETWEEN 1 AND 40
i have this code:
while ($sum<16 || $sum>18){
$totala = 0;
$totalb = 0;
$totalc = 0;
$ranka = mysql_query("SELECT duration FROM table WHERE rank=1 ORDER BY rand() LIMIT 1");
$rankb = mysql_query("SELECT duration FROM table WHERE rank=2 ORDER BY rand() LIMIT 1");
$rankc = mysql_query("SELECT duration FROM table WHERE rank=3 ORDER BY rand() LIMIT 1");
while ($rowa = mysql_fetch_array($ranka)) {
echo $rowa['duration'] . "<br/>";
$totala = $totala + $rowa['duration'];
}
while ($rowb = mysql_fetch_array($rankb)) {
$totalb = $totalb + $rowb['duration'];
}
while ($rowc = mysql_fetch_array($rankc)) {
$totalc = $totalc + $rowc['duration'];
}
$sum=$totala+$totalb+$totalc;
}
echo $sum;
It works fine, But the problem is until "$sum=16" the "echo $rowa['duration']" executes, the question is, is there a away to "echo" only the latest executed code in the "while ($rowa = mysql_fetch_array($ranka))" i this while loop?
Because most of the times returns all the numbers until the "$sum=16"
You are explicitly echoing the $rowa['duration'] in the first inner while loop. If you only want to print the last duration from the $ranka set, simple change the echo to $rowa_duration = $rowa['duration'] then echo it outside the loop.
while ($rowa = mysql_fetch_array($ranka)) {
$rowa_duration = $rowa['duration'];
$totala = $totala + $rowa['duration'];
}
echo $rowa_duration . '<br/>';
What you are doing there is bad on multiple levels. And your english horrid. Well .. practice makes perfect. You could try joining ##php chat room on FreeNode server. That would improve both your english and php skills .. it sure helped me a lot. Anyway ..
The SQL
First of all, to use ORDER BY RAND() is extremely ignorant (at best). As your tables begin the get larger, this operation will make your queries slower. It has n * log2(n) complexity, which means that selecting querying table with 1000 entries will take ~3000 times longer then querying table with 10 entries.
To learn more about it , you should read this blog post, but as for your current queries , the solution would look like:
SELECT duration
FROM table
JOIN (SELECT CEIL(RAND()*(SELECT MAX(id) FROM table)) AS id) as choice
WHERE
table.id >= choice.id
rank = 1
LIMIT 1
This would select random duration from the table.
But since you you are actually selecting data with 3 different ranks ( 1, 2 and 3 ), it would make sense to create a UNION of three queries :
SELECT duration
FROM table
JOIN (SELECT CEIL(RAND()*(SELECT MAX(id) FROM table)) AS id) as choice
WHERE
table.id >= choice.id
rank = 1
LIMIT 1
UNION ALL
SELECT duration
FROM table
JOIN (SELECT CEIL(RAND()*(SELECT MAX(id) FROM table)) AS id) as choice
WHERE
table.id >= choice.id
rank = 2
LIMIT 1
UNION ALL
SELECT duration
FROM table
JOIN (SELECT CEIL(RAND()*(SELECT MAX(id) FROM table)) AS id) as choice
WHERE
table.id >= choice.id
rank = 3
LIMIT 1
Look scary, but it actually will be faster then what you are currently using, and the result will be three entries from duration column.
PHP with SQL
You are still using the old mysql_* functions to access database. This form of API is more then 10 years old and should not be used, when writing new code. The old functions are not maintained (fixed and/or improved ) anymore and even community has begun the process of deprecating said functions.
Instead you should be using either PDO or MySQLi. Which one to use depends on your personal preferences and what is actually available to you. I prefer PDO (because of named parameters and support for other RDBMS), but that's somewhat subjective choice.
Other issue with you php/mysql code is that you seem to pointlessly loop thought items. Your queries have LIMIT 1, which means that there will be only one row. No point in making a loop.
There is potential for endless loop if maximum value for duration is 1. At the start of loop you will have $sum === 15 which fits the first while condition. And at the end that loop you can have $sum === 18 , which satisfies the second loop condition ... and then it is off to the infinity and your SQL server chokes.
And if you are using fractions for duration, then the total value of 3 new results needs to be even smaller. Just over 2. Start with 15.99 , ends with 18.01 (that's additional 2.02 in duration or less the 0.7 per each). Again .. endless loop.
Suggestion
Here is how i would do it:
$pdo = new PDO('mysql:dbname=my_db;host=localhost', 'username', 'password');
$pdo->setAttribute(PDO::ATTR_EMULATE_PREPARES, false);
$sum = 0;
while ( $sum < 16 )
{
$query = 'that LARGE query above';
$statement = $pdo->prepare( $query );
if ( $statement->execute() )
{
$data = $statement->fetchAll( PDO::FETCH_ASSOC );
$sum += $data[0]['duration']+$data[1]['duration']+$data[2]['duration'];
}
}
echo $data[0]['duration'];
This should do what your code did .. or at least, what i assume, was your intentions.
Looking for some advice on the best way to accomplish this. I've tried Unions, Joins, and Alias examples from a few Stack Overflow questions - none seems to get me where I want to go to no fault of theirs. I think I've just been looking to solve this the wrong way.
I've got one table that logs all activity from our users. Each log contains a column with an ID and another with a TIMESTAMP. There is no column that states what the event type is.
What I'm looking to do is grab counts within a range and append a virtual column with the activation date (first access) regardless if it is in the range or not. The business case for this is that I'd like to have reports that show users active within a range, their activation date, and the amount of events in the range.
The HTML output of this would look like this:
User / Total Visits in the Range / First Visit (in the range or not) / Most Recent Visit (in the range)
How I've gotten this far is by doing this:
$result = mysql_query("
SELECT user, MIN(timestamp), MAX(timestamp), count(user)
AS tagCount FROM " . $table . "
WHERE date(timestamp) BETWEEN '" . $startdate . "' AND '" . $enddate . "'
GROUP BY user
ORDER BY " . $orderby . " " . $order) or die(mysql_error());
I then loop:
$i = 1;
while($row = mysql_fetch_array($result)){
$user_name = str_replace("www.", "", $row['user']); // removing www from usernames
if( $i % 2 != 0) // used for alternating row colors
$iclass = "row";
else
$iclass = "row-bg";
echo "<div class=\"" . $iclass . "\"><div class=\"number\">" . $i . "</div><div class=\"number\">" . $row['tagCount'] . "</div><div class=\"name\">" . "" . $server_name . "" . "</div>" . "<div class=\"first\">" . $row['MIN(timestamp)'] . "</div><div class=\"recent\">" . $row['MAX(timestamp)'] . "</div></div>";
$i++;
}
The MIN(timestamp) in the above grabs the first timestamp in the range - I want to grab the first timestamp regardless of range.
How can I do this?
The key is to create a virtual derived table that calculates their first access separately and then join to it from your query that returns records for the time period you specify.
The below is SQL Server code, but I think it's fine in mysql too. If not, let me know and i'll edit the syntax. The concept is sound either way though.
Just setup code for the sample
if object_id('tempdb..#eventlog') is not null
drop table #eventlog
create table #eventlog
(
userid int ,
eventtimestamp datetime
)
insert #eventlog
select 1,'2011-02-15'
UNION
select 1,'2011-02-16'
UNION
select 1,'2011-02-17'
UNION
select 2,'2011-04-18'
UNION
select 2,'2011-04-20'
UNION
select 2,'2011-04-21'
declare #StartDate datetime
declare #EndDate datetime
set #StartDate = '02-16-2011'
set #EndDate = '05-16-2011'
Here's the code that would solve your problem, you can replace #eventlog with your tablename
select e.userid,
min(eventtimestamp)as FirstVisitInRange,
max(eventtimestamp) as MostRecentVisitInRange,
min(e2.FirstAccess) as FirstAccessEver,
count(e.userid) as EventCountInRange
from #eventlog e
inner join
(select userid,min(eventtimestamp) as FirstAccess
from #eventlog
group by userid
) e2 on e.userid = e2.userid
where
e.eventtimestamp between #StartDate and #EndDate
group by e.userid
I have a MySQL table with phone calls. Every row means one phone call.
Columns are:
start_time
start_date
duration
I need to get a maximum phone calls called at the same time. It's because of telephone exchange dimensioning.
My solution is to create two timestamp columns timestamp_start and timestamp_end. Then I run a loop second by second, day by day and ask MySQL something like:
SELECT Count(*) FROM tbl WHERE start_date IN (thisday, secondday) AND "this_second_checking" BETWEEN timestamp_start AND timestamp_end;
It's quite slow.
Is there a better solution? Thank you!
EDIT - I use this solution and it gives me proper results. There is used SQL layer dibi - http://dibiphp.com/cs/quick-start .
$starts = dibi::query("SELECT ts_start, ts_end FROM " . $tblname . " GROUP BY ts_start");
if(count($starts) > 0):
foreach ($starts as $row) {
if(isset($result)) unset($result);
$result = dibi::query('SELECT Count(*) FROM ' . $tblname . ' WHERE "'.$row->ts_start.'" BETWEEN ts_start AND ts_end');
$num = $result->fetchSingle();
if($total_max < $num):
$total_max = $num;
endif;
}
endif;
echo "Total MAX: " . $total_max;
Instead of running it second by second, you should for each row (phonecall) see what other phone calls were active at that time. After that you group all of the results by the row's ID, and check which has the maximum count. So basically something like this:
SELECT MAX(calls.count)
FROM (
SELECT a.id, COUNT(*) AS count
FROM tbl AS a
INNER JOIN tbl AS b ON (
(b.timestamp_start BETWEEN a.timestamp_start AND a.timestamp_end)
OR
(b.timestamp_end BETWEEN a.timestamp_start AND a.timestamp_end)
)
GROUP BY a.id
) AS calls
Creating an index on the timestamp columns will help as well.
I'm going to add something to #reko_t answer. I think there is a use case to consider.
Calls that start before and ended after - Calls completely overlapped
So, how about:
SELECT MAX(calls.count)
FROM (
SELECT a.id, COUNT(*) AS count
FROM tbl AS a
INNER JOIN tbl AS b ON (
(b.timestamp_start BETWEEN a.timestamp_start AND a.timestamp_end)
OR
(b.timestamp_end BETWEEN a.timestamp_start AND a.timestamp_end)
OR
(b.timestamp_start <= a.timestamp_start AND b.timestamp_end >= a.timestamp_end)
)
GROUP BY a.id
) AS calls
How about:
SELECT MAX(callCount) FROM (SELECT COUNT(duration) AS callCount, CONCAT(start_date,start_time) AS callTime FROM tbl GROUP BY callTime)
That would give you the max number of calls in a single "time". Assuming start_date and start_time are strings. If they're integer times, you could probably optimise it somewhat.