MySql query for cohort analysis

MySql query for cohort analysis - php

I am working with MySql and Symfony2. I need to build cohort analysis table. I need to compare how many users in each cohort log in to website at least once a week after they register. What I tried to do is to get number of registered users by week, basically these are my cohorts.
SELECT DATE_FORMAT(date_added,'%d %b %y') as reg_date, COUNT(*) AS user_count
FROM user
WHERE date_added>='2016-02-01' AND date_added<=NOW()
GROUP BY WEEK(date_added)
This query gets distinct users logged in to website by week.
SELECT WEEK(login_date) AS week, COUNT(DISTINCT user_id) AS user_count
FROM user_log
WHERE login_date>='2016-02-01' AND login_date<=NOW()
GROUP BY WEEK(login_date)
My problem: I can't figure out how to group logged in users by cohorts and compare cohorts by weeks. I hope I stated problem clearly. English is not my first language. Thanks.
Sample data:
user table
id | date_added (in WEEK() format)
A | 1
B | 1
C | 1
D | 2
E | 2
F | 2
G | 2
------------
user_log table
user_id | login_date (in WEEK() format)
A | 1
B | 1
B | 1
A | 2
D | 2
A | 2
D | 2
E | 2
Expected table. Cohort 1 - users registered in week 1, cohort 2- in week etc. Size - number of registered users. Week 1 - how many users logged back to website in a first week after registration, Week 2 - how many users logged back to website in a second week after registration
Cohort size Week1 Week2
Cohort 1 | 3 | 2 | 1 |
Cohort 2 | 4 | 2 | - |

This is borrowed from my modification of #Andriy M's answer of this question: Cohort analysis in SQL
This query gets unique user logins by week after registering.
SELECT DISTINCT
user_id,
FLOOR(DATEDIFF(user_log.login_date, user.date_added)/7) AS Offset
FROM user_log
LEFT JOIN user ON (user.id = user_log.user_id)
WHERE user_log.login_date >= CURDATE() - INTERVAL 14 DAY
This query gets all the users created in the past 14 days and formats the date to the week they signed up:
SELECT
id,
DATE_FORMAT(date_added, "%Y-%u") AS cohort
FROM user
WHERE date_added >= CURDATE() - INTERVAL 14 DAY
We can put those two queries together to get a table with how many people came back after registering:
SELECT STR_TO_DATE(CONCAT(u.cohort, ' Monday'), '%X-%V %W') as date,
SUM(s.Offset = 0) AS size,
SUM(s.Offset = 1) AS Week1,
SUM(s.Offset = 2) AS Week2
FROM (
SELECT
id,
DATE_FORMAT(date_added, "%Y-%u") AS cohort
FROM user
WHERE date_added >= CURDATE() - INTERVAL 21 DAY
) as u
LEFT JOIN (
SELECT DISTINCT
user_id,
FLOOR(DATEDIFF(user_log.login_date, user.date_added)/7) AS Offset
FROM user_log
LEFT JOIN user ON (user.id = user_log.user_id)
WHERE user_log.login_date >= CURDATE() - INTERVAL 21 DAY
) as s
ON s.user_id = u.id
GROUP BY u.cohort
ORDER BY u.cohort
Since we aren't counting how many people registered in a given week, we are assuming that they logged at lease once in the week they registered to give an accurate result for the size column.
Also you'll have to rework this to get a number for the cohort instead of the date, but I find dates more helpful.
Also you can extend this to more weeks - you'll have to change the number of days after INTERVAL in both subqueries, and you can add more rows on in the main select statement to get more weeks.

Related

How to select all entries of a user from a database?

I am trying to make a "top purchaser" module on my store and I am a bit confused about the MySQL query.
I have a table with all transactions and I need to select the person (which could have one or many transactions) with the highest amount of money spent in the past month.
What I have:
name | money spent
------------------
john | 50
mike | 12
john | 10
jane | 504
carl | 99
jane | 12
jane | 1
What I want to see:
With a query, I need to see:
name | money spent last month
-----------------------------
jane | 517
carl | 99
john | 60
mike | 12
How do I do that?
I do not really seem to find many good solutions since my MySQL query skills are quite basic. I thought of making a table in which money is added to the user when he buys something.

That's a simple aggregated query :
SELECT t.name, SUM(t.moneyspent) money_spent_last_month
FROM mytable t
GROUP BY t.name
ORDER BY t.money_spent_last_month DESC
LIMIT 1
The query sums the total money sped by customer name. The results are ordered by descending total money spent, and only the first row is retained.
If you are looking to filter data over last month, you need a column in the table that keeps track of the transaction date, say transaction_date, and then you can just add a WHERE clause to the query, like :
SELECT t.name, SUM(t.moneyspent) money_spent_last_month
FROM mytable t
WHERE
t.transaction_date >=
DATE_ADD(LAST_DAY(DATE_SUB(NOW(), INTERVAL 2 MONTH)), INTERVAL 1 DAY)
AND t.transaction_date <=
DATE_SUB(NOW(), INTERVAL 1 MONTH)
GROUP BY t.name
ORDER BY t.money_spent_last_month DESC
LIMIT 1
This method is usually more efficient than using DATE_FORMAT to format dates as string and compare the results.

Get total hours with PHP & MySQL

I have the following table
id | user_id | date | status
1 | 53 | 2018-09-18 06:59:54 | 1
2 | 62 | 2018-09-18 07:00:16 | 1
3 | 53 | 2018-09-18 09:34:12 | 2
4 | 53 | 2018-09-18 12:16:27 | 1
5 | 53 | 2018-09-18 18:03:19 | 2
6 | 62 | 2018-09-18 18:17:41 | 2
I would like to get the total working hours (from date range) and group them by user_id
UPDATE
The system does not "require" a check-out so if there is only one value can we set a default check out time lets say 19:00:00? IF not I can check every day at 21:00:00 if there is not a checkout time to manually insert it at 19:00:00
UPDATE 2
I have added a new field in the table "status" so the very first check-in of the date the status = 1 and every 2nd check-in the status = 2
So if a user check-ins for the 3rd time during the day the status will be 1 again etc.
I hope this will make things easier
Thanks

In case of multiple check-in and check-out happening within a day, for a user:
Utilizing Correlated Subquery, we can find corresponding "checkout_time" for every "checkin_time".
Also, note the usage of Ifnull(), Timestamp() functions etc, to consider default "checkout_time" as 19:00:00, in case of no corresponding entry.
Then, considering this enhanced data-set as Derived Table, we group the data-set based on the user_id and date. Date (yyyy-mm-dd) can be determined using Date() function.
Eventually, use Timestampdiff() function with Sum aggregation, to determine the total work seconds for a user_id at a particular date.
You can easily convert these total seconds to hours (either in your application code, or at the query itself (divide seconds by 3600).
The reason I have preferred to compute using seconds, as Timestampdiff() function returns integer only. So there may be truncation errors, in case of multiple checkin/checkout(s).
Use the following query (replace your_table with your actual table name):
SELECT inner_nest.user_id,
DATE(inner_nest.checkin_time) AS work_date,
SUM(TIMESTAMPDIFF(SECOND,
inner_nest.checkin_time,
inner_nest.checkout_time)) AS total_work_seconds
FROM
(
SELECT t1.user_id,
t1.date as checkin_time,
t1.status,
IFNULL( (
SELECT t2.date
FROM your_table AS t2
WHERE t2.user_id = t1.user_id
AND t2.status = 2
AND t2.date > t1.date
AND DATE(t2.date) = DATE(t1.date)
ORDER BY t2.date ASC LIMIT 1
),
TIMESTAMP(DATE(t1.date),'19:00:00')
) AS checkout_time
FROM `your_table` AS t1
WHERE t1.status = 1
) AS inner_nest
GROUP BY inner_nest.user_id, DATE(inner_nest.checkin_time)
Additional: Following solution will work for the case when there is a single check-in, and corresponding check-out on the same date.
You first need to group the dataset based on the user_id and date. Date (yyyy-mm-dd) can be determined using Date() function.
Now use aggregation functions like Min() and Max() to find the starting and closing time for a user_id at a particular date.
Eventually, use Timestampdiff() function to determine the working hours for a user_id at a particular date (difference between the closing and starting time)
Try the following query (replace your_table with your actual table name):
SELECT user_id,
DATE(`date`) AS working_date,
TIMESTAMPDIFF(HOUR, MIN(`date`), MAX(`date`)) AS working_hours
FROM your_table
GROUP BY
user_id,
DATE(`date`)

Use TIMESTAMPDIFF function
the query more like :
SELECT t1.user_id, TIMESTAMPDIFF(HOUR,t1.date,t2.date) as difference
FROM your_table t1
INNER JOIN your_table t2 on t1.user_id = t2.user_id
Group By t1.user_id
You can see this as preference TimeStampDiff

SQL query to count number of days with reappearing entries

I have a database with access controll log entries:
time : datetime (this is the access timestamp)
src: text (this is the userid)
I want to get a list out of it that shows how many users from the current day had already access on how many days during the past 7 days. The result should look like this:
number of days with access | count
1 | 30
2 | 54
3 | 123
4 | 843
5 | 3490
6 | 71
7 | 23
What I have so far:
The query below returns the number of users with log entry on 2015-03-08 that had also an entry on 2015-03-07.
SELECT Count(DISTINCT a.src)
FROM contacts AS a
LEFT JOIN contacts AS b
ON a.src = b.src
WHERE a.time BETWEEN Cast('2015-03-08 05:00:00' AS DATETIME) AND Cast('2015-03-09 05:00:00' AS DATETIME)
AND b.time BETWEEN Cast('2015-03-07 05:00:00' AS DATETIME) AND Cast('2015-03-08 05:00:00' AS DATETIME)
But I'm stuck with getting the count for each dayby number of days as described above. If there is no 'sql only' solution it would be ok as well to have an (performant) approach using php. Thanks for any help..

I don't see any reason why do you need to join b table.
SELECT
DAY(a.time),
COUNT(DISTINCT a.src)
FROM contacts AS a
WHERE a.time
BETWEEN (TIMESTAMP(CURDATE()) - INTERVAL 1 WEEK)
AND TIMESTAMP(CONCAT(CURDATE(),' 23:59:59'))
GROUP BY DAY(a.time)

sql query to find users with three or more months in debt

I have a little app (PHP/MySQL) to manage condos. There's a condos table, apartments table, a owners table and a account table.
In the account table I have the fields month_paid and year_paid (among others).
Each time someone pays the monthly fee, the table is updated with the number of the month and the year.
Here's some sample table structure:
condos table:
+----+------------+---------+
| id | condo_name | address |
+----+------------+---------+
apartments table:
+----+----------------+----------+
| id | apartment_name | condo_id |
+----+----------------+----------+
owners table:
+----+--------------+------------+
| id | apartment_id | owner_name |
+----+--------------+------------+
account table:
+----+----------+----------+------------+-----------+
| id | owner_id | condo_id | month_paid | year_paid |
+----+----------+----------+------------+-----------+
So, if I have a record in account table like this, it means this owner paid August 2012:
+----+----------+----------+------------+-----------+
| id | owner_id | condo_id | month_paid | year_paid |
+----+----------+----------+------------+-----------+
| 1 | 1 | 1 | 8 | 2012 |
+----+----------+----------+------------+-----------+
What I would like to know is how to make a SQL query (using PHP) to get the owners with three or more months in debt or, in other words, owners that have not payed the fee for the last three months or more.
If possible, the data should be grouped by condo, like this:
CONDO XPTO:
Owner 1: 3 months debt
Onwer 2: 5 months debt
CONDO BETA
Owner 1: 4 months debt
Onwer 2: 6 months debt
Thanks

You need to write a query something like this:
SELECT
*
FROM
owners
JOIN
account
ON
owners.id = account.owners_id
WHERE
CONCAT( account.year_paid , '-' , account.month_paid , '-01') <= DATE_ADD( NOW(), INTERVAL -3 MONTH );
Sadly that's about all I can give you with the information you have provided. If you could show more detailed table structure, I could help you out more.

You are making it harder on yourself by storing it this way. Now you need to calculate the difference in months yourself. You cannot just check on months, because e.g. in january you also need to take the year into consideration.
SELECT *
FROM owners
JOIN account ON owners.Id=account.ownersId
WHERE (
account.year_paid = year(now)
AND (
month(now)-account.month_paid>=3
)
) OR (
account.year_paid = year(now)-1
AND (
month(now)>=3
OR (
account.month_paid - month(now) <= 10
AND month(now) = 1
)
OR (
account.month_paid - month(now) <= 11
AND month(now) = 2
)
)
) OR (
account.year_paid < year(now)-1
)
Better to just store the lastpaid time in a datetime collumn so you can use date functions.

To fix your account table. The DROP COLUMNs are optional, you might want keep them if they have dependencies.
ALTER TABLE account ADD COLUMN date_paid DATETIME;
UPDATE account SET date_paid = CONCAT(year_paid,'-',month_paid,'-01');
--ALTER TABLE account DROP COLUMN year_paid;
--ALTER TABLE account DROP COLUMN month_paid;
This is how you’d get your data. I used LEFT OUTER JOINs in case you have any missing owner or condo records.
SELECT c.condo_name,
o.owner_name,
min(PERIOD_DIFF(DATE_FORMAT(now(), '%Y%m'), DATE_FORMAT(date_paid, '%Y%m'))) min_months_debt
FROM account a
LEFT OUTER JOIN condos c ON (c.id = a.condo_id)
LEFT OUTER JOIN owners o ON (a.owner_id = o.id)
WHERE a.date_paid <= DATE_ADD(NOW(), INTERVAL -3 MONTH)
GROUP BY c.condo_name, o.owner_name
ORDER BY c.condo_name
P.S. The above only works for condo owners who have made at least one payment. If you want to see condo owners who have never paid then you’re going to have to associate condos with owners outside of the accounts table. Or perhaps you create an account record when you associate a condo with an owner, in which case you don’t have a problem.

If I understand correct you have two different fields for moth and for year. Why?
If you had one field say paid_date this query would work for you
SELECT owner_id from account WHERE NOW()>DATE_ADD(paid_date, INTERVAL 3 MONTH)
If you can change fielfd then sorry. I hope this is helpful.
UPD:
Then I'd suggest you concatenation of two fields (year and month) in the query to make it look like real date in YYYY-MM-DD format and use it in DATE_ADD function instead of paid_date field

Try this,
select o.owner_name,c.condo_name from owners o join account a join condos c
on o.id=a.owner_id and c.id =a.condo_id where year(curdate())=a.year_paid
and a.month_paid<month(curdate())-3

I haven't seen any answers that actually produce what you are looking for. The following calculates the months in debt by calculating the current month minus the most recent payment date. It then concatenates the results into a string:
select c.condo_name, o.owner_name,
cast(YEAR(now)*12+MONTH(now)) - MAX(year_paid*12+month_paid) as varchar(255)), ' month(s) debt'
from account a join
owners o
on o.id = a.owner_id join
condos c
on c.id = a.condo_id
group by c.condo_name,, o.owner_name
order by 1, 2
If you want only delinquent payers, then add a where clause to the effect of:
where YEAR(now)*12+MONTH(now)) - MAX(year_paid*12+month_paid) > 1
Because you don't have the day of the month of the payment, there are some borderline conditions you might miss.

mysql PHP query to contain count over join

I have this query which after checking various tutorials should work - but it doesn't.
$query="SELECT week, year, COUNT(week) AS week_no
FROM archive_agent_booking
LEFT JOIN invoice_additions ON invoice_additions.week = archive_agent_booking.week
WHERE client_id='$account_no' GROUP BY week, year ORDER BY week DESC";
The tables are as follows:
archive_agent_booking
+---------+----------+----------+----------+----------+---------+---------+
| job_id | week | year | desc | price | date | acc_no |
+---------+----------+----------+----------+----------+---------+---------+
invoice_additions
+---------+----------+----------+----------+----------+---------+
| acc_no | week | year | desc | am_price | am_date |
+---------+----------+----------+----------+----------+---------+
I basically want to count each week element from both tables and display them as one total even if one of the week values does not show in one of the tables. Don't know whether this is the best solution so I am open to alternatives.

select
week,
sum(items)
from
(
(select week, count(*) as items from archive_agent_booking group by week)
union
(select week, count(*) from invoice_additions group by week)
)
group by
week
Edit: i've made some huge assumption about what you want to see

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

MySql query for cohort analysis - php

Related

How to select all entries of a user from a database?

Get total hours with PHP & MySQL

SQL query to count number of days with reappearing entries

sql query to find users with three or more months in debt

mysql PHP query to contain count over join

Categories

Resources