应用-留存率与连续天数的通解
1.简述
留存率与连续天数是判断用户粘性的重要指标。
本文给出两者的一般解法,但:
1.使用时仍应注意日期的限制条件等实际要求
2.部分问题可能是此类问题的变种或基于此求解,多多思考实现举一反三
3.解法中的内容有些未必要保留,或可进一步优化,可根据实际情况删除
2.相关信息
# 每个用户的首次登录时间 SELECT logid,MIN(logdate) FROM login GROUP BY logid; # 每个用户的最后登录时间 SELECT logid,MAX(logdate) FROM login GROUP BY logid; # 用户数 SELECT COUNT(DISTINCT logid) FROM login; # 每天用户数 WITH T1 AS (SELECT DISTINCT logid,DATE(logdate) AS date FROM login) SELECT date, COUNT(logid) FROM T1 GROUP BY date; # 每天新用户数 WITH T1 AS (SELECT DISTINCT logid,DATE(logdate) AS date FROM login), T2 AS (SELECT logid, date,date=MIN(date) OVER(PARTITION BY logid) AS tag FROM T1) SELECT date,SUM(tag) FROM T2 GROUP BY date;
3.留存率
WITH T1 AS (SELECT DISTINCT logid AS id, DATE(logdate) AS date FROM login), # 实际情况logdate往往不是日期,而是日期时间,不便于后面使用,此处统一截断为日期 # 对于同一用户同一天登录多次的只保留一次 T2 AS (SELECT *, DATEDIFF(LEAD(date,1,date) OVER(PARTITION BY id ORDER BY date ASC),date) AS diff, # 取出当前用户当前日期的下一日期,求当前日期与下一日期的差值 # 差值为1说明该日期的次日再次登录,差值为0说明后续无登录 # 如要求三日留存率则LEAD(date,3),且diff=3,说明首次登录后(不含首日)第3日仍登录(即连续登录3天) # 如某用户20240101首次登录,三日留存指202402-202404都登录 date=MIN(date) OVER(PARTITION BY id) AS tag # 当前日期是否为该用户的首次登录日期 FROM T1) # 以上已得出计算留存率的所有信息,后续计算基于此即可方便地得出 # 如求新用户的平均次日留存率 WITH T1 AS (SELECT DISTINCT logid AS id, DATE(logdate) AS date FROM login), T2 AS (SELECT *, DATEDIFF(LEAD(date,1,date) OVER(PARTITION BY id ORDER BY date ASC),date) AS diff, date=MIN(date) OVER(PARTITION BY id) AS tag FROM T1) SELECT AVG(diff=1) FROM T2 WHERE tag=1; # 取出所有用户首次登录的记录,diff=1返回1说明其次日登录,否则返回0说明其次日未登录 # 得出0,1组成的列,对其求平均值即新用户平均次日留存率 # 如求每一天的新用户平均次日留存率 WITH T1 AS (SELECT DISTINCT logid AS id, DATE(logdate) AS date FROM login), T2 AS (SELECT *, DATEDIFF(LEAD(date,1,date) OVER(PARTITION BY id ORDER BY date ASC),date) AS diff, date=MIN(date) OVER(PARTITION BY id) AS tag FROM T1) SELECT AVG(diff=1) FROM T2 WHERE tag=1; SELECT date,IFNULL(AVG(IF(tag=0,NULL,diff=1)),0) FROM T2 GROUP BY date; # 按照日期分组 # 如果tag=0说明该用户不是首次登录,返回NULL,不参与平均值的计算 # diff=1返回1说明其次日登录,否则返回0说明其次日未登录 # 得出0,1组成的列,对其求平均值即新用户平均次日留存率 # 此时可能返回NULL即该日没有新用户登录,IFNULL将其转换为0,某日无新用户则直接认为留存率为0 # 基于留存率的解法,可求解每个用户的最大登录间隔问题,请自行实现
4.连续天数
WITH T1 AS (SELECT DISTINCT logid AS id, DATE(logdate) AS date FROM login), T2 AS (SELECT *, SUBDATE(date,ROW_NUMBER(PARTITION BY id ORDER BY date ASC)) AS val, # 对于每个用户按日期升序赋予序号,用当前日期减去序号对应的天数 # 如果当前日期与下一日期减对应序号是同一日期说明这两天是连续的 # 同理,若多个val的相同说明对应的日期均为连续日期 # 此处存在隐含条件,即序号不能超过当前日期对应的数值,因为最小日期为0000-00-00(但通常不会所以暂不考虑) FROM T1), T3 AS (SELECT *, COUNT(val) OVER(PARTITION BY id,val ORDER BY date ASC ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS cnt # 先按id分区再按val分区,并按日期升序,求从当前行起至分区尾行的数量,即每个用户每一天起的连续天数 date=MIN(date) OVER(PARTITION BY id) AS tag) # 同理用于判断新用户的首次登录日期 # 以上已得出计算连续天数的所有信息,后续计算基于此即可方便地得出 # 每个用户的最大连续登录天数 WITH T1 AS (SELECT DISTINCT logid AS id, DATE(logdate) AS date FROM login), T2 AS (SELECT *, SUBDATE(date,ROW_NUMBER(PARTITION BY id ORDER BY date ASC)) AS val, FROM T1), T3 AS (SELECT *, COUNT(val) OVER(PARTITION BY id,val ORDER BY date ASC ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS cnt date=MIN(date) OVER(PARTITION BY id) AS tag) SELECT id,MAX(cnt) FROM T3 GROUP BY id; # 新用户连续登录的最大天数 WITH T1 AS (SELECT DISTINCT logid AS id, DATE(logdate) AS date FROM login), T2 AS (SELECT *, SUBDATE(date,ROW_NUMBER(PARTITION BY id ORDER BY date ASC)) AS val, FROM T1), T3 AS (SELECT *, COUNT(val) OVER(PARTITION BY id,val ORDER BY date ASC ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS cnt date=MIN(date) OVER(PARTITION BY id) AS tag) SELECT MAX(IF(tag=1,cnt,0)) FROM T3; # 特别地,对于求连续登录天数大于等于指定数值的可采用求留存率的方式实现,请自行实现 /* 基于连续天数解法可求解用户连续登录的总奖励值问题。 例如,连续登录的每一天都增加一点奖励值,七天一个循环,其中每个7天中第3天额外增加3点,第7天额外增加5点,求每个用户的总奖励值,请自行实现 */
5.相关题目
5.1留存率类
每日的新用户数(其一):SQL263 牛客每个人最近的登录日期(四)
每日的新用户数(其二):SQL166 每天的日活数及新用户占比
每日的次日留存率:SQL29 计算用户的平均次日留存率
新用户的次日留存率:SQL262 牛客每个人最近的登录日期(三)
每日的新用户的次日留存率(其一):SQL264 牛客每个人最近的登录日期(五)
每日的新用户的次日留存率(其二):SQL164 2021年11月每天新用户的次日留存率
次日留存率的变种:SQL138 连续两次作答试卷的最大时间窗
5.2连续天数类
连续登录天数大于等于指定天数(也可采用留存率解法):SQL184 某宝店铺连续2天及以上购物的用户及其对应的天数
连续登录天数大于等于指定天数(同上):SQL194 某乎问答最大连续回答问题天数大于等于3天的用户及其对应等级
连续等于大于等于指定天数(同上):SQLW11 查询连续登陆的用户
每个用户的最大连续登录天数:SQLW2 最长连续登录天数
连续天数的变种:SQL167 连续签到领金币
更多知识在专栏
#SQL进阶#MySQL的使用 文章被收录于专栏
由于专栏功能问题造成阅读不便,本专栏放弃更新,所有内容将迁移到计算机编程合集专栏下