Every time a logged in user visits the website their data is put into a table containing the userId and date (either one or zero row per user per day):
444631 2011-11-07
444631 2011-11-06
444631 2011-11-05
444631 2011-11-04
444631 2011-11-02
444631 2011-11-01
I need to have ready access to the number of consecutive visits when I pull the user data from the main user table.. In the case for this user, it would be 4.
Currently I'm doing this through a denormalized consecutivevisits counter in the main user table, however for unknown reasons it sometimes resets.. I want to try an approach that uses exclusively the data in the table above.
What's the best SQL query to get that number (4 in the example above)? There are users who have hundreds of visits, we have millions of registered users and hits per day.
EDIT: As per the comments below I'm posting the code I currently use to do this; it however has the problem that it sometimes resets for no reason and it also reset it for everyone during the weekend, most likely because of the DST change.
// Called every page load for logged in users
public static function OnVisit($user)
{
$lastVisit = $user->GetLastVisit(); /* Timestamp; db server is on the same timezone as www server */
if(!$lastVisit)
$delta = 2;
else
{
$today = date('Y/m/d');
if(date('Y/m/d', $lastVisit) == $today)
$delta = 0;
else if(date('Y/m/d', $lastVisit + (24 * 60 * 60)) == $today)
$delta = 1;
else
$delta = 2;
}
if(!$delta)
return;
$visits = $user->GetConsecutiveVisits();
$userId = $user->GetId();
/* NOTE: t_dailyvisit is the table I pasted above. The table is unused;
* I added it only to ensure that the counter sometimes really resets
* even if the user visits the website, and I could confirm that. */
q_Query("INSERT IGNORE INTO `t_dailyvisit` (`user`, `date`) VALUES ($userId, CURDATE())", DB_DATABASE_COMMON);
/* User skipped 1 or more days.. */
if($delta > 1)
$visits = 1;
else if($delta == 1)
$visits += 1;
q_Query("UPDATE `t_user` SET `consecutivevisits` = $visits, `lastvisit` = CURDATE(), `nvotesday` = 0 WHERE `id` = $userId", DB_DATABASE_COMMON);
$user->ForceCacheExpire();
}
I missed the mysql tag and wrote up this solution. Sadly, this does not work in MySQL as it does not support window functions.
I post it anyway, as I put some effort into it. Tested with PostgreSQL. Would work similarly with Oracle or SQL Server (or any other decent RDBMS that supports window functions).
Test setup
CREATE TEMP TABLE v(id int, visit date);
INSERT INTO v VALUES
(444631, '2011-11-07')
,(444631, '2011-11-06')
,(444631, '2011-11-05')
,(444631, '2011-11-04')
,(444631, '2011-11-02')
,(444631, '2011-11-01')
,(444632, '2011-12-02')
,(444632, '2011-12-03')
,(444632, '2011-12-05');
Simple version
-- add 1 to "difference" to get number of days of the longest period
SELECT id, max(dur) + 1 as max_consecutive_days
FROM (
-- calculate date difference of min and max in the group
SELECT id, grp, max(visit) - min(visit) as dur
FROM (
-- consecutive days end up in a group
SELECT *, sum(step) OVER (ORDER BY id, rn) AS grp
FROM (
-- step up at the start of a new group of days
SELECT id
,row_number() OVER w AS rn
,visit
,CASE WHEN COALESCE(visit - lag(visit) OVER w, 1) = 1
THEN 0 ELSE 1 END AS step
FROM v
WINDOW w AS (PARTITION BY id ORDER BY visit)
ORDER BY 1,2
) x
) y
GROUP BY 1,2
) z
GROUP BY 1
ORDER BY 1
LIMIT 1;
Output:
id | max_consecutive_days
--------+----------------------
444631 | 4
Faster / Shorter
I later found an even better way. grp numbers are not continuous (but continuously rising). Doesn't matter, since those are just a mean to an end:
SELECT id, max(dur) + 1 AS max_consecutive_days
FROM (
SELECT id, grp, max(visit) - min(visit) AS dur
FROM (
-- subtract an integer representing the number of day from the row_number()
-- creates a "group number" (grp) for consecutive days
SELECT id
,EXTRACT(epoch from visit)::int / 86400
- row_number() OVER (PARTITION BY id ORDER BY visit) AS grp
,visit
FROM v
ORDER BY 1,2
) x
GROUP BY 1,2
) y
GROUP BY 1
ORDER BY 1
LIMIT 1;
SQL Fiddle for both.
More
A procedural solution for a similar problem.
You might be able to implement something similar in MySQL.
Closely related answers on dba.SE with extensive explanation here and here.
And on SO:
GROUP BY and aggregate sequential numeric values
If it is not necessary to have a log of every day the user was logged on to the webiste and you only want to know the consecutive days he was logged on, I would prefer this way:
Chose 3 columns: LastVisit (Date), ConsecutiveDays (int) and User.
On log-in you check the entry for the user, determine if last visit was "Today - 1", then add 1 to the columns ConsecutiveDays and store "Today" in column LastVisit. If last vist is greater than "Today - 1" then store 1 in ConsecutiveDays.
HTH
Related
Hi Guys I have a question. I am still learning and am trying to get some date out. Beneath is the table. It has hundreds of lines, but for example:
FormNR
Datum
XX1
XX2
XX3
0001
2022-09-08
4
23
7
0002
2022-09-10
8
5
0
The table name is 'forms'. Now what I need to do is to count XX1+XX2+XX3 (for a year rapport). Then I have a 'date from and to' selection box on my page. So the question would be:
What instanties have been used between a certain date in total but so that you can see a a total per Instantie (each number is a different instantie).
So for example...Between the 1st of January and the 1st of June a list of all XX numbers ( there are 36 ) with there total behind it
What I have is the following. Is works great and shows all XX's in a nice table but for the entire table, not per date. As soon as i want to add the 'between $date_from AND $date_to' it fails.
<?php
$sql_rg_total="SELECT forms.Datum, x.f1,Count(x.f1)
FROM
(SELECT XX1 As F1 FROM forms
UNION ALL
SELECT XX2 As F1 FROM forms
UNION ALL
SELECT XX3 As F1 FROM forms) x
WHERE x.f1 = '$subcat_id'
GROUP BY x.f1";
$resultvv=mysqli_query($conn, $sql_rg_total);
if (mysqli_num_rows($resultvv) > 0) {
while ($rowvv = mysqli_fetch_assoc($resultvv)) {
$subnr = $rowvv['Count(x.f1)'];
echo $subnr;
}
}
?>
By the way $subcat_id is from another table which connects the number to a name.
I have tried to write it as clear as I could. I know it's a bit thought haha. Thanks anyway for any input. Really stuck.
This query should do it:
SELECT SUM(x.c) AS c
FROM (
SELECT ((XX1 = '$subcat_id') + (XX2 = '$subcat_id') + (XX3 = '$subcat_id')) AS c
FROM forms
WHERE Datum BETWEEN '$date_from' AND '$date_to'
) x
The value of a boolean condition is 1 when it's true, 0 when it's false. So XX1 = '$subcat_id' + XX2 = '$subcat_id' + XX3 = '$subcat_id' adds up the number of columns that match in a row, then SUM(c) totals them in the entire table.
You don't need GROUP BY, since it's the same column that you're filtering in the WHERE condition (and now in the SELECT expression). And this moves the date condition into the subquery.
I can't figure out how to make a selection of the matches between 2 teams even if is at home or away.
Example:
I have this table:
MatchID | status | date | short (home) | opponent (Away)
1 ENDED XXX TEAM A TEAM B
2 ENDED XXX TEAM B TEAM A
3 ENDED XXX TEAM C TEAM B
4 ENDED XXX TEAM D TEAM A
I have a lot of matches and I want to make a module where I can show all previous matches between team A and team B. (even if is at home or away).
Right now this is my code, but is only showing 1 match of 2 possible matches.
$ergebnis = safe_query("SELECT * FROM table WHERE status='ENDED' AND ((opponent = '".$opp_match."' AND short = '".$short_match."') OR (opponent = '".$short_match."' AND short = '".$opp_match."')) ORDER BY date LIMIT 0,5");
And I want to limit 0,5. Just want the last 5 matches between the 2 teams.
"opp_match" and "short_match" are connections to the other module. When I'm check the match I can check home team and away team with those.
With or without the limits I can't show more than one result.
I just want to show matchID 1 and 2. But right now I'm just getting matchID 1.
EDIT: I tried another way but I'm loading all ended matches.
$ergebnis = safe_query("SELECT * FROM ".PREFIX."upcoming WHERE status='ENDED'");
$i=1;
while($ds=mysql_fetch_array($ergebnis)) {
if($ds['short']==$short_match AND $ds['opponent']==$opp_match) {
eval ("\$matches = \"".gettemplate("matches")."\";");
echo $matches;
}
elseif($ds['opponent']==$short_match AND $ds['short']==$opp_match) {
eval ("\$matches = \"".gettemplate("matches")."\";");
echo $matches;
}
else echo '';
$i++;
}
Will this work?
SELECT *
FROM table
WHERE status = "ENDED"
AND
(
(opponent = "team1" AND short = "team2")
OR
(opponent = "team2" AND short = "team1")
)
ORDER BY MatchID DESC
LIMIT 5;
This assumes that if one match has a smaller ID than another, it has been played before the other.
I have a table that shows me when people are available to work, like the following:
+------+---------------------+---------------------+
| name | start | end |
+------+---------------------+---------------------+
| Odin | 2015-07-01 11:00:00 | 2015-07-01 11:30:00 |
| Thor | 2015-07-01 11:00:00 | 2015-07-01 11:30:00 |
| Odin | 2015-07-01 11:20:00 | 2015-07-01 12:45:00 |
| Odin | 2015-07-01 12:30:00 | 2015-07-01 15:30:00 |
| Thor | 2015-07-01 15:00:00 | 2015-07-01 17:00:00 |
+------+---------------------+---------------------+
I'd like to check if a specific person is available to work in a given range. For example, I want to have a PHP function that returns the names of people available to work in a given range, like so: canWork($start, $end)
This important part is handling the overlaps, especially since the table could be very, very large. For example, if I called canWork('2015-07-01 11:10:00', '2015-07-01 15:30:00') I would expect to get Odin back given the 1st, 3rd and 4th rows of the table together do cover that range.
Is there an easy way to do this with MySQL? Or PHP?
Try to avoid looping over data in this sort of large data situations. In a similar exercise SQL was able to deliver in seconds what in code took hours. Having a smart look at the data pays of.
The smart step here is: You can reduce the number of possible matches by checking the SUM of the time: The time in the range should be equal (or smaller) then the SUM of the time in the records.
However since the start time entered can be smaller then the starttime you are looking for, and the end time can be larger then the endtime you are looking for, you first have to find the end time closest to endtime and start time closest to starttime.
(end is a reserved word, so this code will not work with that columnname, endtime and starttime are the variables for the schedule check)
Start time per user (last possible):
SELECT name,MAX(start) AS MAX_start
FROM scheduleTable
WHERE start<=starttime
GROUP BY name;
End time per user (first possible)
SELECT name,MIN(`end`) AS MIN_end
FROM scheduleTable
WHERE `end`>=endtime
GROUP BY name;
Joining these together gives a subset of possible users, plus this can be filtered on the
SELECT name, MAX_start,MIN_end
FROM
(SELECT name,MIN(`end`) AS MIN_end
FROM scheduleTable
WHERE `end`>=endtime
GROUP BY name) a
INNER JOIN
(SELECT name,MAX(start) AS MAX_start
FROM scheduleTable
WHERE start<=starttime
GROUP BY name) b ON a.name=b.name;
This will give you a schedule with a valid end as close as possible to the endtime indicated for scheduling purpose but at least equal to the indicated endtime.
Applying the fact that all the time frames together must at least be equal to the endtime-starttime:
SELECT st.name
FROM scheduleTable st
INNER JOIN (
SELECT name, MAX_start AS start,MIN_end AS end
FROM
(SELECT name,MIN(`end`) AS MIN_end
FROM scheduleTable
WHERE `end`>=endtime
GROUP BY name) a
INNER JOIN
(SELECT name,MAX(start) AS MAX_start
FROM scheduleTable
WHERE start<=starttime
GROUP BY name) b ON a.name=b.name
) et ON st.name=et.name
WHERE et.start>={starttime} AND `end`<=et.endtime AND et.name=st.name
GROUP BY st.name
HAVING SUM(st.`end`-st.start)>=(endtime-starttime);
You might have to manipulate the start and end time to unix time or use mysql date time functions for the calculations.
There still might be gaps: Those need a second check. For this use the group_concat to get some data we can pass as 1 call into a function. The function results in 0 for: no gaps found, 1 for gaps found:
SELECT a.name
FROM (
SELECT st.name,
GROUP_CONCAT(start ORDER BY start ASC SEPARATOR ',') starttimelist,
GROUP_CONCAT(`end` ORDER BY `end` ASC SEPARATOR ',') endtimelist
FROM scheduleTable st
INNER JOIN (
SELECT name, MAX_start AS start,MIN_end AS end
FROM
(SELECT name,MIN(`end`) AS MIN_end
FROM scheduleTable
WHERE `end`>=endtime
GROUP BY name) a
INNER JOIN
(SELECT name,MAX(start) AS MAX_start
FROM scheduleTable
WHERE start<=starttime
GROUP BY name) b ON a.name=b.name
) et ON st.name=et.name
WHERE et.start>={starttime} AND `end`<=et.endtime AND et.name=st.name
GROUP BY st.name
HAVING SUM(st.`end`-st.start)>=(endtime-starttime);
) a
WHERE gapCheck(starttimelist,endtimelist)=0;
WARNING: Do not add DISTINCT to the GROUP_CONCAT: The start/endtimelist will have different lengths and the gaCcheck function will fail....
The function gapCheck:
In this function the first start time and the last end time can be ignored: start time is larger or equal then starttime and end time is larger or equal to endtime. So no boundary checks are needed, plus boundaries do not have to be checked for gaps anyway.
CREATE FUNCTION gapCheck(IN starttimeList VARCHAR(200),endtimeList VARCHAR(200))
BEGIN
DECLARE helperTimeStart,helperTimeEnd,prevHelperTimeStart,prevHelperTimeEnd DATETIME
DECLARE c,splitIndex,gap INT
SET c-0;
SET gap=0;
WHILE(c=0) DO
SET splitIndex=INSTR(starttimeList,',');
IF(splitIndex>0) THEN
SET helperTimeStart=SUBSTRING(starttimeList,1,splitIndex-1);
SET starttimeList=SUBSTRING(starttimeList,splitIndex); /* String for the next iteration */
ELSE
SET helperTimeStart=starttimeList; /* End of list reached */
SET helperTimeEnd=endtimeList; /* end can be set too: Lists are of same length */
SET c=1;
END IF;
IF(splitIndex>0) THEN
SET splitIndex=INSTR(endtimeList,',');
SET helperTimeEnd=SUBSTRING(endtimeList,1,splitIndex-1);
END IF;
IF prevHelperTimeEnd>=helperTimeEnd THEN /* if prevHelperTimeEnd is not set, this is false and the check is skipped: on the first record we can not check anything */
/* If previous end time > current start time: We have a gap */
IF CAST(prevHelperTimeEnd AS DATETIME)>=CAST(helperTimeStart AS DATETIME) THEN
gap=1;
END IF;
END IF;
/* save some data for the next loop */
SET prevHelperTimeStart=helperTimeStart;
SET prevHelperTimeEnd=helperTimeEnd;
END WHILE;
RETURN gap;
END;
I think the shortest way to do this would be to
1) First merge all timelines for the same person with overlaps.
For eg. row 1 and row 3 would be merged to change the end time of row 1 to '2015-07-01 12:45:00' (and row 3 would be deleted or marked used), and then row 1 and row 4 would be merged to again change the end time of row 1 to '2015-07-01 15:30:00'.
2) Once you have a table of non-overlapping timelines, this is a simple problem of finding rows where start <= $start and end >= $end.
For 1) I would prefer executing this process in PHP by first copying the whole table in a data structure
$a = array();
//in a for loop after a select all query: for (all elements) {
$a[$name][$start] = $end));
//} end of for loop
And then removing all overlaps from that data structure:
for($a as $currName => $timeArray) {
ksort($timeArray);
removeOverlaps(&$timeArray);
}
function removeOverlaps($timeArray) {
$allKeys = array_keys($timeArray);
$arrLength = count($allKeys);
for ($i = 0; $i < $arrlength; ++$i) {
$start = $allKeys[$i];
if(array_key_exists($start, $timearray)) {
$end = $timeArray[$start])
for ($j = $i; $j < $arrlength; ++$j) {
$newStart = $allKeys[$j];
$newEnd = $timeArray[$newStart];
if($newStart <= $end) && ($newEnd > $end)) {
$timeArray[$start] = $newEnd;
unset($timeArray[$newStart]);
}
}
}
}
}
Then continue with 2).
I got the two tables(Table1 and Table2):
Table1:
id hits url
1 11 a
2 5 b
3 6 c
4 99 d
5 14 e
Table2:
id url 2014.04.13 2014.04.14
1 a 0 5
2 b 0 1
3 c 0 3
4 d 0 60
5 e 0 10
hi all,
Table1 one contains the actual hits(which are always up-to-date) and Table2 to statistics(which are done every day at midnight). The columns id(unique number) and url are in both tables the same. So they got the same amount of rows.
So i create every day a new column(with the date of today) and copy the column hits from the table 'Table1' into the new created column into the table 'Table2'
First i alter Table2:
$st = $pdo->prepare("ALTER TABLE Table2 ADD `$today_date` INT(4) NOT NULL");
$st->execute();
Then i cache all entries i need from Table1:
$c = 0;
$id = array();
$hits = array();
$sql = "SELECT id, hits FROM Table1 ORDER BY id ASC";
$stmt = $pdo->query($sql);
while($row = $stmt->fetch(PDO::FETCH_ASSOC))
{
$id[$c] = $row['id'];
$hits[$c] = $row['hits'];
$c++;
}
At last i update Table2:
for ($d = 0 ; $d < $c ; $d++)
{
$id_insert = $id[$d];
$sql = "UPDATE DOWNLOADS_TEST SET `$datum_det_dwnloads`=? WHERE id=?";
$q = $pdo->prepare($sql);
$q->execute(array($hits[$d], $id[$d]));
if($q->rowCount() == 1 or $hits[$d] == 0) // success
$hits[$d] = 0;
else // error inserting (e.g. index not found)
$d_error = 1; // error :( //
}
So what i need is to copy(insert) a column from one table to another.
The two tables are having ~2000 elements and the copying as described above takes around 40 sec. The bottleneck is the last part (inserting into the Table2) as i found out.
One thing i found is to do multiple updates in one query. Is there anything i can do besides that?
I hope you realise that at some point your table will have irrational number of columns and will be highly inefficent. I strongly advise you to use other solution, for example another table that holds data for each row for each day.
Let's say you have a table with 2000 rows and two columns: ID and URL. Now you want to know the count of hits for each URL so you add column HITS. But then you realise you will need to know the count of hits for each URL for every date, so your best bet is to split the tables. At this moment you have one table:
Table A (A_ID, URL, HITS)
Now remove HITS from Table A and create Table B with ID and HITS attributes). Now you have:
Table A (A_ID, URL)
Table B (B_ID, HITS)
Next move is to connect those two tables:
Table A (A_ID, URL)
Table B (B_ID, A_ID, HITS)
Where A_ID is foreign key to attribute "A_ID" of Table A. In the end it's the same as first step. But now it's easy to add date attribute to Table B:
Table A (A_ID, URL)
Table B (B_ID, A_ID, HITS, DATE)
And you have your solution for database structure. You will have a lot of entries in table B, but it's still better than a lot of columns. Example of how it would look like:
Table A | A_ID | URL
0 index
1 contact
Table B | B_ID | A_ID | HITS | DATE
0 0 23 12.04.2013
1 1 12 12.04.2013
2 0 219 13.04.2013
3 1 99 13.04.2013
You can also make unique index of A_ID and DATE in Table B, but I prefer to work on IDs even on linking tables.
position | Average | gpmp
1 70.60 2.0
2 60.20 2.3
3 59.80 4.8
4 59.80 4.8
5 45.70 5.6
Hie All,
As above table, I need to arrange the position according to the lowest gpmp and the highest average. But when the both average and gmp are the same, I will need to have the position to be the same.
For example, position 3 and 4 have the same average and gpmp. How do I generate the mysql query or using php function so that after they detect the same average and gpmp and change the position 4 to 3.
Which mean after the function is generated it will become like the table below.
position | Average | gpmp
1 70.60 2.0
2 60.20 2.3
3 59.80 4.8
3 59.80 4.8
5 45.70 5.6
Here's a simple way to update the table as you described in your post - taking the sequential positions and updating them accordingly. It doesn't calculate the positions or anything, just uses the data already there:
UPDATE `table` t SET position = (
SELECT MIN(position) FROM (SELECT * FROM `table`) t2 WHERE t.Average = t2.Average AND t.gpmp = t2.gpmp
)
I'd give something like the following a try, through it does assume a primary key is on this table. Without a primary key you're going to have issues updating specific rows easily / you'll have a lot of duplicates.
So for this example I'll assume the table is as follows
someTable (
pkID (Primary Key),
position,
Average,
gpm
)
So the following INSERT would do the job I expect
INSERT INTO someTable (
pkID,
position
)
SELECT
someTable.pkID,
calcTable.position
FROM someTable
INNER JOIN (
SELECT
MIN(c.position) AS position,
c.Average,
c.gpm
FROM (
// Calculate the position for each Average/gpm combination
SELECT
#p = #p + 1 AS position,
someTable.Average,
someTable.gpm
FROM (
SELECT #p:=0
) v,someTable
ORDER BY
someTable.Average DESC,
someTable.gpmp ASC
) c
// Now regroup to get 1 position for each combination (the lowest position)
GROUP BY c.Average,c.gpm
) AS calcTable
// And then join this calculated table back onto the original
ON (calcTable.Average,calcTable.gpm) = (someTable.Average,someTable.gpm)
// And rely on the PK IDs clashing to allow update
ON DUPLICATE KEY UPDATE position = VALUES(position)
(pseudo code)
select * from table
get output into php var
foreach (php row of data)
is row equal to previous row?
yes - don't increment row counter, increment duplicate counter
no - increment row counter with # of duplicates and reset duplicate counter
save current row as 'previous row'
next
you can try something like this in php:
$d= mysql_query('select distinct gpmp from tablename order by gpmp');
pos= 1;
while($r= mysql_fetch_array($d)){
mysql_query('update tablename set position='.$pos.' where gpmp='.$r['gpmp']);
$pos++;
}
You only need to "expand" the idea to take averange in account too.