SQL query efficiency between two tables - php

I had the following query (MySQL) that is very slow (about 15 seconds). I have changed the names of columns and tables, so sorry if it has any type error; the original query is working, keep only the concept, no the literal query.
SELECT
id,
b,
(SELECT MAX( day )
FROM all_days
WHERE all_days.id = X.id
) AS day
FROM X
Note that all_days has more than 2 million rows. I have 3 indexes: One for the id, other for the day and other for {id,day}
But if I separate the query in N queries with UNION, it only takes about 1 second or less with the same result:
<?php
$ids = getIds(); // get all ID from X with a query
$i = 0
foreach ($ids as $id) {
if ($i++ > 0) {
$query .= " UNION ";
}
$query .= "SELECT MAX( day )
FROM all_days
WHERE all_days.id = $id";
}
?>
Any ideas of how could I increase the speed without doing UNIONS?
EDIT (added structure):
Table X:
id INTEGER PRIMARY KEY
b INTEGER -- extra info
Table all_days:
day_id INTEGER PRIMARY KEY
id INTEGER FK X.id
day DATETIME
all_days indexes:
id
day
id,day

Please have a try with this query:
SELECT
id,
b,
max_day
FROM X
INNER JOIN
(
SELECT id, MAX(`day`) AS max_day
FROM all_days
GROUP BY id
) AS max_days
ON max_days.id = X.id
The reason why this should be much faster is, that here per id the max(day) is stored in memory (or temporary table on disk if too large) and is then connected to table X. In your query you read every row of table X and for every row you query table all_days.

In a simple situation like this (assuming the combination of X.id / X.b is unique) then this can be done without the need for a sub query:-
SELECT X.id,
X.b,
MAX( all_days.day ) AS day
FROM X
LEFT OUTER JOIN all_days
ON all_days.id = X.id
GROUP BY X.id, X.b

Related

Select Nth record from MySQL query from Millions of rows

I have a MySQL query as below; I would like to select the top record for each range of 600 records in a table with 1.8M records. So far I have to loop 3,000 times to accomplish this which is not an efficient solution.
Database Schema;
Table: bet_perm_13predict
id bet_id perm_id avg_odd avg_odd2 avg_odd3
1 23 1 43.29 28.82 28.82
2 23 2 42.86 28.59 28.59
3 23 3 43.13 28.73 28.73
Table: bet_permute_13games
perm_id perm_code
1 0000000000000
2 0000000000001
3 0000000000002
4 0000000000010
Sample MySQL Query in PHP
$totRange = 0; //Used as starting point in rang
$range = 600; //Used as range
$stop = 0;//Used as endPoint of range
while($totRange < 1800000){
$stop = $totRange+$range;
$sql = "SELECT (tb1.avg_odd2 + tb1.avg_odd3) AS totAvg_odd ,
tb1.perm_id , tb1.avg_odd, tb1.avg_odd2, tb1.avg_odd3, tb2.perm_code
FROM bet_perm_13predict tb1
INNER JOIN bet_permute_13games tb2 ON tb2.perm_id = tb1.perm_id
WHERE tb1.bet_id = '$bet_id' && tb1.perm_id
BETWEEN $startRange AND $stop ORDER BY totAvg_odd ASC LIMIT 1"
$q1 = $this->db->query($sql);
$totRange = $stop;
}
In other words I want to select a sample of the data that will represent the entire table with the sample not being random but predefined using the top record in range of 600. So far I have no idea how to proceed. There is no clear online material on this subject.
You can use integer division to create groups.
DEMO
SELECT ID, ID DIV 600 as grp
FROM Table1
Then find the max value on each group. Some options here
Get records with max value for each group of grouped SQL results
For those who might encounter the same issue, this is how I solved it. I used #Juan Carlos suggestion and added a way to pick top record of group using Subquery.
SELECT * FROM
(SELECT * , perm_id DIV $limit as grp , (avg_odd2 + avg_odd3) AS totAvg_odd
FROM bet_perm_13predict WHERE bet_id = '$bet_id' ORDER BY grp ASC ) tb1
INNER JOIN bet_permute_13games tb2 ON tb2.perm_id = tb1.perm_id
INNER JOIN bet_entry tb3 ON tb3.bet_id = tb1.bet_id
WHERE tb1.avg_odd2 < (SELECT AVG(avg_odd2) FROM bet_perm_13predict WHERE bet_id = '$bet_id' )
&& tb1.avg_odd3 < (SELECT AVG(avg_odd3) FROM bet_perm_13predict WHERE bet_id = '$bet_id' )
GROUP BY grp ORDER BY totAvg_odd ASC
LIMIT 100

Random allocation until all options are used

I have a table of available teams teams, with 24 different options.
I have another table entries, where each row is an allocation of one team to a user.
When an entry is created, a random team that has not been picked is allocated. However, if all the teams have been allocated (this can happen multiple times), only teams not yet allocated in this round of allocation are available.
For example, if my teams are A, B, C and D:
If there is an entry for A in entries, only B, C and D are available
If A, B, C and D have been picked, they are all available again
IF A has 3 entries, B has 3 entries, C has 2 entries and D has 2 entries, only C and D are available, until they all have the same number of entries
My code for this is convoluted:
//Make array of teams
for($i=1;$i<=24;$i++) $team[$i] = 1;
//Get entries from database
$stmt = $dbh->prepare("SELECT `team` FROM `entries`");
$stmt->execute();
$rows = $stmt->fetchAll(PDO::FETCH_ASSOC);
//Create array of available teams
$numRows = $stmt->rowCount();
while($numRows >= 24) {
for($i=1;$i<=24;$i++) {
$team[$i] = $team[$i]+1;
}
$numRows = $numRows - 24;
}
//Remove entries for teams in array
foreach($rows as $row) $team[$row["team"]] = $team[$row["team"]]-1;
foreach($team as $i => $v) if($v > 0) $available[] = $i;
There must be a more straightforward method to accomplish this; how can this be done?
The following gives you the number of assignments for each team:
SELECT team, COUNT(*) FROM entries GROUP BY team;
This gives you the minimum count for any team:
SELECT MIN(count) FROM (
SELECT COUNT(*) as count FROM entries GROUP BY team
)
To get the teams with the minimum count - those being available - but those two queries together into one:
SELECT teamcounts.team
FROM
(SELECT team, COUNT(*) as num FROM entries GROUP BY team) as teamcounts
WHERE
teamcounts.num = (
SELECT MIN(num) FROM (
SELECT COUNT(*) as num FROM entries GROUP BY team
) as tcounts
)
To get also those teams not yet included in entries we have to use the team table as well, removing all teams not currently available for selection:
SELECT teams.name
FROM teams
WHERE teams.name NOT IN (
SELECT teamcounts.team
FROM
(SELECT team, COUNT(*) as num FROM entries GROUP BY team) as teamcounts
WHERE
teamcounts.num != (
SELECT MIN(num) FROM (
SELECT COUNT(*) as num FROM entries GROUP BY team
) as tcounts
)
)
I haven't found a solution that works solely in SQL, however I've created the following query:
SELECT `id`, `num_selected` FROM
(SELECT `id`, SUM(is_selected) AS `num_selected` FROM
(SELECT t.`id`, CASE WHEN e.`team` IS NULL THEN 0 ELSE 1 END AS is_selected FROM `entries` e RIGHT JOIN `teams` t ON t.`id` = e.`team`)
AS `table1`
GROUP BY `id`)
AS `table2` GROUP BY `id` ORDER BY `num_selected` ASC, `id` ASC
This includes all the team rows that have NO entries yet, and the result is a table that has every team in one column, and alongside them is the number of selections.
Then, in PHP, I simply take the lowest value of selections (this will be the first row, as I've ordered by num_selected ASC) and only use other rows with that value as possible options:
$baseNum = $rows[0]["num_selected"];
foreach($rows as $row){
if($row["num_selected"]===$baseNum) $availableTeams[] = $row["id"];
}
However, ideally I'd have a solution that takes place solely in the SQL query!

How to calculate difference between values coming from the same row in mysql

I am trying to calculate the difference of values list coming from a database.
I would like to achieve it using php or mysql, but I do not know how to proceed.
I have a table named player_scores. One of its rows contains the goals scored.
Ex.
pl_date pl_scores
03/11/2014 18
02/11/2014 15
01/11/2014 10
I would like to echo the difference between the goals scored during the matches played in different dates.
Ex:
pl_date pl_scores diff
03/11/2014 18 +3
02/11/2014 15 +5
01/11/2014 10 no diff
How can I obtain the desired result?
You seem to want to compare a score against the score on a previous row.
Possibly simplest if done using a a sub query that gets the max pl_date that is less than the pl_date for the current row, then joining the results of that sub query back against the player_scores table to get the details for each date:-
SELECT ps1.pl_date, ps1.pl_scores, IF(ps2.pl_date IS NULL OR ps1.pl_scores = ps1.pl_scores, 'no diff', ps1.pl_scores - ps1.pl_scores) AS diff
FROM
(
SELECT ps1.pl_date, MAX(ps2.pl_date) prev_date
FROM player_scores ps1
LEFT OUTER JOIN player_scores ps2
ON ps1.pl_date > ps2.pl_date
GROUP BY ps1.pl_date
) sub0
INNER JOIN player_scores ps1
ON sub0.pl_date = ps1.pl_date
LEFT OUTER JOIN player_scores ps2
ON sub0.prev_date = ps2.pl_date
There are potentially other ways to do this (for example, using variables to work through the results of an ordered sub query, comparing each row with the value stored in the variable for the previous row)
SELECT score FROM TABLE WHERE DATE = TheDateYouWant
$score = $data['score'];
SELECT score FROM TABLE WHERE date = dateYouWant
$difference = $score - $data['score'];
Something like this?
You could use two queries, one to get the value to use in the comparison (in the example below is the smaller number of scores) and the second one to get the records with a dedicated column with the difference:
SELECT MIN(pl_scores);
SELECT pl_date, pl_scores, (pl_scores - minScore) as diff FROM player_scores;
Or, using a transaction (one query execution php side):
START TRANSACTION;
SELECT MIN(Importo) FROM Transazione INTO #min;
SELECT Importo, (Importo - #min) as diff FROM Transazione;
select *,
coalesce(
(SELECT concat(IF(t1.pl_scores>t2.pl_scores,'+',''),(t1.pl_scores-t2.pl_scores))
FROM tableX t2 WHERE t2.pl_date<t1.pl_date ORDER BY t2.pl_date DESC LIMIT 1)
, 'no data' ) as diff
FROM tableX t1
WHERE 1
order by t1.pl_date DESC

Get variance and standard deviation of two numbers in two different rows/columns with sqlite / PHP

I have a SQLite Database with the following structure:
rowid ID startTimestamp endTimestamp subject
1 00:50:c2:63:10:1a 1000 1090 entrance
2 00:50:c2:63:10:1a 1100 1270 entrance
3 00:50:c2:63:10:1a 1300 1310 door1
4 00:50:c2:63:10:1a 1370 1400 entrance
.
.
.
I have prepared a sqlfiddle here: http://sqlfiddle.com/#!2/fe8c6/2
With this SQL-Query i can get the average differences between the endTime and the startTime between one row and the following row, sorted by subject and ID:
SELECT
id,
( MAX(endtimestamp) - MIN(startTimestamp)
- SUM(endtimestamp-startTimestamp)
) / (COUNT(*)-1) AS averageDifference
FROM
table1
WHERE ID = '00:50:c2:63:10:1a'
AND subject = 'entrance'
GROUP BY id;
My problem: To calcute the average value is no problem, that does this query. But how can i
get the standard deviation and the variance of this values?
First finding the time differences of interest by joining the table to itself and grouping by ID, then finding the averages, variances as V(x) = E(x^2) - (E(x))^2 and standard deviation as sqrt(V)gives
SELECT ID, AVG(diff) AS average,
AVG(diff*diff) - AVG(diff)*AVG(diff) AS variance,
SQRT(AVG(diff*diff) - AVG(diff)*AVG(diff)) AS stdev
FROM
(SELECT t1.id, t1.endTimestamp,
min(t2.startTimeStamp) - t1.endTimestamp AS diff
FROM table1 t1
INNER JOIN table1 t2
ON t2.ID = t1.ID AND t2.subject = t1.subject
AND t2.startTimestamp > t1.startTimestamp -- consider only later startTimestamps
WHERE t1.subject = 'entrance'
GROUP BY t1.id, t1.endTimestamp) AS diffs
GROUP BY ID
For formulas that are more complex than simple summation, you have to compute the actual difference values for each record by lookin up the corresponding next start times, like this:
SELECT (SELECT MIN(startTimestamp)
FROM table1 AS next
WHERE next.startTimestamp > table1.startTimestamp
AND ID = '...'
) - endTimestamp AS timeDifference
FROM table1
WHERE nextStartTimestamp IS NOT NULL
AND ID = '...'
Then you can use all the difference values to do the calculations:
SELECT SUM(timeDifference) / COUNT(*) AS average,
AVG(timeDifference) AS moreEfficientAverage,
SUM(timeDifference * timeDifference) / COUNT(*) -
AVG(timeDifference) * AVG(timeDifference) AS variance
FROM (SELECT (SELECT MIN(startTimestamp)
FROM table1 AS next
WHERE next.startTimestamp > table1.startTimestamp
AND next.ID = '...'
) - endTimestamp AS timeDifference
FROM table1
WHERE nextStartTimestamp IS NOT NULL
AND ID = '...')
A number of points:
Your formula for the mean is wrong the correct formula is SUM(endtimestamp-starttimestamp)/COUNT(endtimestamp). I have no idea why you have the MIN/MAX terms. COUNT(*) will count NULL rows and will give the wrong result.
SQLlite has an avg function which finds the mean.
The formula for the variance is SUM((endtimestamp-starttimestamp)*(endtimestamp-starttimestamp)) - AVG(endtimestamp-starttimestamp)*AVG(endtimestamp-starttimestamp)
The standard deviation is the square root of the variance.
In response to the question authors comment, in order to compute the variance the start and end times must be paired with each other through a self join.
Becuase of the absence of a row_number function in SQL lite this is a little inelegant.
SELECT id,
AVG(startTimestamp-endTimestamp) as mean,
SUM((startTimestamp-endTimestamp)^2) - AVG(startTimestamp-endTimestamp)^2 as variance,
SQRT(SUM((startTimestamp-endTimestamp)^2) - AVG(startTimestamp-endTimestamp)^2) as stDev
FROM
(SELECT
t1.id,
t1.endTimestamp,
MIN(t2.startTimestamp) as starttimestamp
FROM table1 t1
INNER JOIN
table1 t2 ON t1.endTimestamp<=t2.startTimestamp
GROUP BY t1.id, t1.endTimestamp) t
GROUP BY id;
See SQL Fiddle

SELECT from two tables WHERE different columns in each table equal $id ORDER BY common column (PHP/MySQL)

I'm trying to SELECT from two tables and ORDER BY date (a column they both have). One table (tableA) has a column called "A" and the other table (tableB) has a column called "B", I use array_key_exists() to differentiate between the two (If "A" key exists, I run the array through FunctionA(), if "B" key exists, I run the array through FunctionB()). I only need the 20 latest (date wise) entries. I need the SQL Query to accomplish this.
I already know a reply will be "if they're similarly structured, then you should just use a single table", but I don't want to do that because tableA is drastically different from tableB (a lot more columns in tableA), and using a single table to store the data would result in a LOT of empty columns for entries formatted for tableB, not to mention it'd be a very ugly looking table format due to tableB not needing the majority of tableA's columns).
I just want to display data from both tables in an ordered (by date) fashion, and in one single stream.
I need to SELECT WHERE tableA.poster_id = $id and tableB.receiver_id = $id by the way.
SOLUTION:
I'm updating this just in case anyone else with the same dilemma comes along. After implementing the SQL query that #Erik A. Brandstadmoen had graciously given me, this is basically what my code ended up as:
$MySQL->SQL("SELECT * FROM
(SELECT A.id AS id, A.date AS date, 'tableA' AS source
FROM tableA A WHERE A.poster_id = $id
UNION
SELECT B.id AS id, B.date AS date, 'tableB' AS source
FROM tableB B WHERE B.receiver_id = $id) AS T
ORDER BY T.date DESC LIMIT 0, 20");
$GetStream = array();
$i = 0;
while ($row = mysql_fetch_array($MySQL->Result))
{
$GetStream[$i]['id'] = $row['id'];
$GetStream[$i]['date']=$row['date'];
$GetStream[$i]['source'] = $row['source'];
$i++;
}
*** later on down the code ***
$i = 0;
while ($i<count($GetStream))
{
if ($GetStream[$i]['source'] == "tableA")
{
FunctionA($GetStream[$i]);
}
else
{
FunctionB($GetStream[$i]);
}
$i++;
}
Try using UNION:
SELECT * FROM (
SELECT A.col1 AS x, A.col2 As y, A.col3 AS date FROM tableA A
WHERE tableA.poster_id = $id
UNION
SELECT B.colA AS x, B.colB AS y, B.colC AS date FROM tableB B
WHERE tableB.receiver_id = $id
)
ORDER BY date DESC
LIMIT 0, 20
OR, IF you would like to keep duplicates between tableA and tableB, use UNION ALL instead.
EDIT, according to your comments, I understand that you need a column indicating which table the row is from. You can just add a static column in the select, like this:
SELECT * FROM (
SELECT A.col1 AS x, A.col2 As y, A.col3 AS date, 'A' as source FROM tableA A
WHERE tableA.poster_id = $id
UNION
SELECT B.colA AS x, B.colB AS y, B.colC AS date, 'B' as source FROM tableB B
WHERE tableB.receiver_id = $id
)
ORDER BY date DESC
LIMIT 0, 20
This gives you a nice table on the following form:
x y date source
=========================
(v1) (v2) (d1) 'A'
(v3) (v4) (d2) 'B'
(v1) (v2) (d3) 'B'
(v3) (v4) (d4) 'A'
That does what you want, doesn't it? It's a bit difficult understanding what you are really trying to achieve with this...

Categories