题解 | #连续两次作答试卷的最大时间窗#
连续两次作答试卷的最大时间窗
http://www.nowcoder.com/practice/9dcc0eebb8394e79ada1d4d4e979d73c
明确题意:
计算在2021年至少有两天作答过试卷的人中,计算该年连续两次作答试卷的最大时间窗days_window,那么根据该年的历史规律他在days_window天里平均会做多少套试卷,按最大时间窗和平均做答试卷套数倒序排序。
问题拆解:
- 本题主要是考察知识点:lead、date_format,timestampdiff、group by等。切记是以start_time计算,不是以submit_time计算!!!有些submit_time是NULL!!
- 先计算出重复次数大于2的记录的uid
- 使用lead,把下次start_time作为新的一列
- 查询出最大start_time,最小start_time,days_window
- 查询出总共持续天数days_count
- 计算出avg_exam_cnt
中间过程:
mysql> select timestampdiff(day, date_format('2021-09-01 19:59:01','%Y-%m-%d') ,date_format('2021-09-07 12:11:01','%Y-%m-%d')) ;
+------------------------------------------------------------------------------------------------------------------+
| timestampdiff(day, date_format('2021-09-01 19:59:01','%Y-%m-%d') ,date_format('2021-09-07 12:11:01','%Y-%m-%d')) |
+------------------------------------------------------------------------------------------------------------------+
| 6 |
+------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
mysql> select *,
-> lead(start_time, 1) over(partition by uid order by start_time asc) as next_start_time -- lead 往后查询
-> from exam_record
-> where year(start_time) = '2021' and uid in (
-> select uid from (
-> select distinct uid,date_format(start_time,'%Y-%m-%d')
-> from exam_record where year(start_time) = '2021'
-> )t0 group by uid having count(*) > 1 -- 重复大于1的才行
-> )
-> ;
+----+------+---------+---------------------+---------------------+-------+---------------------+
| id | uid | exam_id | start_time | submit_time | score | next_start_time |
+----+------+---------+---------------------+---------------------+-------+---------------------+
| 9 | 1003 | 9003 | 2021-09-01 19:01:01 | 2021-09-01 19:59:01 | 86 | 2021-09-07 12:01:01 |
| 12 | 1003 | 9003 | 2021-09-07 12:01:01 | 2021-09-07 12:11:01 | 40 | 2021-09-08 15:01:01 |
| 13 | 1003 | 9003 | 2021-09-08 15:01:01 | NULL | NULL | NULL |
| 11 | 1005 | 9003 | 2021-09-01 12:01:01 | 2021-09-01 12:31:02 | 85 | 2021-09-05 10:01:01 |
| 4 | 1005 | 9002 | 2021-09-05 10:01:01 | 2021-09-05 10:21:01 | 81 | 2021-09-05 10:31:01 |
| 5 | 1005 | 9001 | 2021-09-05 10:31:01 | 2021-09-05 10:51:01 | 81 | NULL |
| 2 | 1006 | 9001 | 2021-09-01 12:11:01 | 2021-09-01 12:31:01 | 89 | 2021-09-06 10:01:01 |
| 3 | 1006 | 9002 | 2021-09-06 10:01:01 | 2021-09-06 10:21:01 | 81 | 2021-09-07 10:01:01 |
| 1 | 1006 | 9003 | 2021-09-07 10:01:01 | 2021-09-07 10:21:02 | 84 | NULL |
+----+------+---------+---------------------+---------------------+-------+---------------------+
9 rows in set (0.00 sec)
mysql>
mysql> select
-> uid ,
-> max(start_time) as max_time ,
-> min(start_time) as min_time ,
-> max(timestampdiff(day, date_format(start_time,'%Y-%m-%d') ,date_format(next_start_time,'%Y-%m-%d')))+1 as days_window , -- 早的日期放在前面
-> count(*) as exam_cnt
-> from (
-> select *,
-> lead(start_time, 1) over(partition by uid order by start_time asc) as next_start_time -- lead 往后查询
-> from exam_record
-> where year(start_time) = '2021' and uid in (
-> select uid from (
-> select distinct uid,date_format(start_time,'%Y-%m-%d')
-> from exam_record where year(start_time) = '2021'
-> )t0 group by uid having count(*) > 1 -- 重复大于1的才行
-> )
-> )t1
-> group by uid
-> ;
+------+---------------------+---------------------+-------------+----------+
| uid | max_time | min_time | days_window | exam_cnt |
+------+---------------------+---------------------+-------------+----------+
| 1003 | 2021-09-08 15:01:01 | 2021-09-01 19:01:01 | 7 | 3 |
| 1005 | 2021-09-05 10:31:01 | 2021-09-01 12:01:01 | 5 | 3 |
| 1006 | 2021-09-07 10:01:01 | 2021-09-01 12:11:01 | 6 | 3 |
+------+---------------------+---------------------+-------------+----------+
3 rows in set (0.00 sec)
mysql>
mysql> select
-> uid ,
-> timestampdiff(day, date_format(min_time,'%Y-%m-%d') ,date_format(max_time,'%Y-%m-%d'))+1 as days_count,
-> days_window,
-> exam_cnt
-> from (
-> select
-> uid ,
-> max(start_time) as max_time ,
-> min(start_time) as min_time ,
-> max(timestampdiff(day, date_format(start_time,'%Y-%m-%d') ,date_format(next_start_time,'%Y-%m-%d')))+1 as days_window , -- 早的日期放在前面
-> count(*) as exam_cnt
-> from (
-> select *,
-> lead(start_time, 1) over(partition by uid order by start_time asc) as next_start_time -- lead 往后查询
-> from exam_record
-> where year(start_time) = '2021' and uid in (
-> select uid from (
-> select distinct uid,date_format(start_time,'%Y-%m-%d')
-> from exam_record where year(start_time) = '2021'
-> )t0 group by uid having count(*) > 1 -- 重复大于1的才行
-> )
-> )t1
-> group by uid
-> )t3
-> ;
+------+------------+-------------+----------+
| uid | days_count | days_window | exam_cnt |
+------+------------+-------------+----------+
| 1003 | 8 | 7 | 3 |
| 1005 | 5 | 5 | 3 |
| 1006 | 7 | 6 | 3 |
+------+------------+-------------+----------+
3 rows in set (0.00 sec)
代码实现:
select uid , days_window , round(days_window * (exam_cnt /days_count ),2) as avg_exam_cnt from ( select uid , timestampdiff(day, date_format(min_time,'%Y-%m-%d') ,date_format(max_time,'%Y-%m-%d'))+1 as days_count, -- 差值要+1 days_window, exam_cnt from ( select uid , max(start_time) as max_time , min(start_time) as min_time , max(timestampdiff(day, date_format(start_time,'%Y-%m-%d') ,date_format(next_start_time,'%Y-%m-%d')))+1 as days_window , -- 记得+1, 早的日期放在前面 count(*) as exam_cnt from ( select *, lead(start_time, 1) over(partition by uid order by start_time asc) as next_start_time -- lead 往后查询 from exam_record where year(start_time) = '2021' and uid in ( select uid from ( select distinct uid,date_format(start_time,'%Y-%m-%d') from exam_record where year(start_time) = '2021' )t0 group by uid having count(*) > 1 -- 重复大于1的才行 ) )t1 group by uid )t3 )t4 order by days_window desc ,avg_exam_cnt desc ;
