首页 > 试题广场 >

25年网易云音乐-歌手新歌首发后听众分层次日留存与时段活跃矩

[编程题]25年网易云音乐-歌手新歌首发后听众分层次日留存与时段活跃矩
  • 热度指数:259 时间限制:C/C++ 1秒,其他语言2秒 空间限制:C/C++ 256M,其他语言512M
  • 算法知识视频讲解

1. 题目描述

背景
某音乐流媒体平台的内容运营团队在做独立音乐人新歌发布效果分析。每当一首新歌在某位歌手的主页首发上线,团队希望分析:首发当日完整听完该新歌的听众("首发完听用户"),在接下来的 D+1、D+3、D+7 日,是否还在该歌手的任意歌曲下继续产生完整收听(即再次回流听这位歌手)。同时按听众在首发日的收听时段做分层,产出一张歌手 × 时段 × 回流日的矩阵。

表 1:t_song(歌曲表)

  • song_id:BIGINT,歌曲 ID
  • artist_id:BIGINT,歌手 ID
  • song_name:VARCHAR(64),歌曲名
  • duration_sec:INT,歌曲总时长(秒)
  • release_time:DATETIME,首发上线时间(2025 年内,含时分秒;每首歌一个)

表 2:t_play(收听流水表)

  • play_id:BIGINT,收听记录 ID
  • user_id:BIGINT,用户 ID
  • song_id:BIGINT,歌曲 ID(关联 t_song.song_id)
  • play_start:DATETIME,开始收听时间
  • play_sec:INT,实际收听秒数(可能小于 duration_sec)

2. 问题

第一步:识别"首发完听用户"
对每首新歌,首发日 =DATE(release_time)。定位每个歌手首发日当天收听了该歌手该首新歌,且至少有一条 play_sec >= duration_sec(即完整听完一次)的用户。把这些用户记为该歌手的"首发完听用户"。同一用户若在多首新歌的首发日都完听,可重复计入不同歌手的基数。

第二步:按首发日收听时段分层(对每个"首发完听用户 × 歌手"组合单独判定)
取该用户当天对该歌手该新歌最早一条 play_start,按小时归入:

  • morning:HOUR(play_start) ∈ [6, 11](即 06:00:00–11:59:59)
  • afternoon:HOUR(play_start) ∈ [12, 17]
  • night:其他所有时间([18, 23]与[0, 5]都算 night)

第三步:回流判定
回流 = 该用户在 D+N 当天(日历日) 对同一歌手的任意歌曲产生至少一次完整收听(play_sec >= duration_sec)。

  • D+1 =DATE_ADD(release_date, INTERVAL 1 DAY)
  • D+3 =DATE_ADD(release_date, INTERVAL 3 DAY)
  • D+7 =DATE_ADD(release_date, INTERVAL 7 DAY)
  • 若同一歌手同日有多首新歌首发(不同 song_id),以各自的 release_date 分别判定用户回流;同一用户对同一歌手可能被多首新歌各自计入基数(允许重复)。

输出字段

  • artist_id(歌手 ID)
  • time_slot(时段:morning / afternoon / night)
  • base_user_cnt(该歌手该时段的首发完听用户基数,即"用户 × 新歌"组合数)
  • d1_rate(D+1 回流率,四舍五入保留 2 位小数)
  • d3_rate(D+3 回流率,同上)
  • d7_rate(D+7 回流率,同上)

只输出base_user_cnt > 0的行;无需输出空组合。

排序规则(严格按此顺序):

  1. d7_rate降序
  2. base_user_cnt降序
  3. artist_id升序
  4. time_slot按固定顺序 morning → afternoon → night(不是字典序

3. 示例数据表

t_song

song_id artist_id song_name duration_sec release_time
1001 501 Midnight Drive 210 2025-04-10 00:00:00
1002 501 Summer Light 180 2025-04-15 00:00:00
1003 502 Quiet Street 240 2025-04-10 00:00:00

t_play

play_id user_id song_id play_start play_sec
1 9001 1001 2025-04-10 08:15:00 210
2 9001 1002 2025-04-11 09:00:00 180
3 9001 1001 2025-04-13 22:00:00 210
4 9002 1001 2025-04-10 14:00:00 210
5 9002 1001 2025-04-10 14:40:00 100
6 9002 1001 2025-04-17 15:00:00 210
7 9003 1001 2025-04-10 23:30:00 209
8 9004 1001 2025-04-10 20:00:00 210
9 9004 1001 2025-04-11 10:00:00 150
10 9001 1002 2025-04-15 10:00:00 180
11 9001 1001 2025-04-16 10:00:00 210
12 9001 1001 2025-04-18 10:00:00 210
13 9001 1001 2025-04-22 10:00:00 210
14 9005 1003 2025-04-10 07:00:00 240
15 9005 1003 2025-04-17 07:00:00 240

4. 示例数据查询结果表

推导

歌手 501 共有两次新歌首发(1001 日首发于 04-10,1002 首发于 04-15),"首发完听用户"组合(user × song):

  • (9001, 1001):首发日完听 at 08:15(morning);D+1 04-11 听 1002 完听 ✓;D+3 04-13 听 1001 完听 ✓;D+7 04-17 无 501 完听记录 ✗
  • (9002, 1001):首发日完听 at 14:00(afternoon);D+1 04-11 ✗;D+3 04-13 ✗;D+7 04-17 听 1001 完听 ✓
  • (9003, 1001):首发日未完听(209<210),不进基数
  • (9004, 1001):首发日完听 at 20:00(night);D+1 04-11 play_sec=150<210 ✗;D+3/D+7 无 ✗
  • (9001, 1002):首发日 04-15 完听 at 10:00(morning);D+1 04-16 听 1001 完听 ✓;D+3 04-18 听 1001 完听 ✓;D+7 04-22 听 1001 完听 ✓

歌手 501 分组:

  • morning:(9001,1001)、(9001,1002) → base=2;D+1=1.00;D+3=1.00;D+7=0.50
  • afternoon:(9002,1001) → base=1;D+1=0.00;D+3=0.00;D+7=1.00
  • night:(9004,1001) → base=1;D+1=0.00;D+3=0.00;D+7=0.00

歌手 502:

  • morning:(9005,1003) → base=1;D+1=0.00;D+3=0.00;D+7=1.00

按严格排序规则(d7 降序 → base 降序 → artist_id 升序 → slot 固定序):

artist_id time_slot base_user_cnt d1_rate d3_rate d7_rate
501 afternoon 1 0.00 0.00 1.00
502 morning 1 0.00 0.00 1.00
501 morning 2 1.00 1.00 0.50
501 night 1 0.00 0.00 0.00
示例1

输入

CREATE TABLE t_song (
    song_id       BIGINT,
    artist_id     BIGINT,
    song_name     VARCHAR(64),
    duration_sec  INT,
    release_time  DATETIME
);

CREATE TABLE t_play (
    play_id     BIGINT,
    user_id     BIGINT,
    song_id     BIGINT,
    play_start  DATETIME,
    play_sec    INT
);

INSERT INTO t_song VALUES
(1001, 501, 'Midnight Drive',   210, '2025-04-10 00:00:00'),
(1002, 501, 'Summer Light',     180, '2025-04-15 00:00:00'),
(1003, 502, 'Quiet Street',     240, '2025-04-10 00:00:00');

INSERT INTO t_play VALUES
(1,  9001, 1001, '2025-04-10 08:15:00', 210),
(2,  9001, 1002, '2025-04-11 09:00:00', 180),
(3,  9001, 1001, '2025-04-13 22:00:00', 210),
(4,  9002, 1001, '2025-04-10 14:00:00', 210),
(5,  9002, 1001, '2025-04-10 14:40:00', 100),
(6,  9002, 1001, '2025-04-17 15:00:00', 210),
(7,  9003, 1001, '2025-04-10 23:30:00', 209),
(8,  9004, 1001, '2025-04-10 20:00:00', 210),
(9,  9004, 1001, '2025-04-11 10:00:00', 150),
(10, 9001, 1002, '2025-04-15 10:00:00', 180),
(11, 9001, 1001, '2025-04-16 10:00:00', 210),
(12, 9001, 1001, '2025-04-18 10:00:00', 210),
(13, 9001, 1001, '2025-04-22 10:00:00', 210),
(14, 9005, 1003, '2025-04-10 07:00:00', 240),
(15, 9005, 1003, '2025-04-17 07:00:00', 240);

输出

artist_id|time_slot|base_user_cnt|d1_rate|d3_rate|d7_rate
501|afternoon|1|0.00|0.00|1.00
502|morning|1|0.00|0.00|1.00
501|morning|2|1.00|1.00|0.50
501|night|1|0.00|0.00|0.00
with b as (
    select h.user_id,h.song_id,h.play_start,i.artist_id
    from t_play h
    join t_song i
    on h.song_id=i.song_id
    where h.play_sec>=i.duration_sec
),
c as(
select a.artist_id,b.song_id,b.user_id,date(b.play_start) as play_start,
       if(hour(b.play_start) between 6 and 11,'morning',
       if(hour(b.play_start) between 12 and 17,'afternoon',
       'night')) as time_slot,
       rank() over(partition by b.song_id,user_id order by b.play_start) as rk
from b
join t_song a
on a.song_id=b.song_id
where DATE(a.release_time)=date(b.play_start)),
d as (
select c.artist_id,c.time_slot,c.user_id,c.play_start,
        datediff(date(b.play_start),date(c.play_start)) as daydiff
from c
left join b
on c.user_id=b.user_id
and c.artist_id=b.artist_id
where c.rk=1 )
select d.artist_id,d.time_slot,
       count(distinct d.user_id,d.play_start) as base_user_cnt,
       round(count(if(d.daydiff=1,d.user_id,null))/count(distinct d.user_id,d.play_start),2) as d1_rate,
       round(count(if(d.daydiff=3,d.user_id,null))/count(distinct d.user_id,d.play_start),2) as d3_rate,
       round(count(if(d.daydiff=7,d.user_id,null))/count(distinct d.user_id,d.play_start),2) as d7_rate
from d
group by d.artist_id,d.time_slot
order by d7_rate desc,base_user_cnt desc,artist_id,time_slot desc
发表于 2026-04-29 22:29:58 回复(1)
select t2.artist_id,time_slot,count(t2.user_id) as base_user_cnt,
round(sum(if(EXISTS(
  select 1
  from t_play b
join t_song a
on a.song_id = b.song_id
  where
    t2.user_id = b.user_id          
    and a.artist_id = t2.artist_id      
    and b.play_sec >= a.duration_sec    
    and date(b.play_start) = date(t2.release_time)+ interval 1 day),1,0))/COUNT(t2.user_id),2)as d1_rate,
round(sum(if(EXISTS(
  select 1
  from t_play b
join t_song a
on a.song_id = b.song_id
  where
    t2.user_id = b.user_id          
    and a.artist_id = t2.artist_id      
    and b.play_sec >= a.duration_sec    
    and date(b.play_start) = date(t2.release_time)+ interval 3 day),1,0))/COUNT(t2.user_id),2)as d3_rate,
round(sum(if(EXISTS(
  select 1
  from t_play b
join t_song a
on a.song_id = b.song_id
  where
    t2.user_id = b.user_id          
    and a.artist_id = t2.artist_id      
    and b.play_sec >= a.duration_sec    
    and date(b.play_start) = date(t2.release_time)+ interval 7 day),1,0))/COUNT(t2.user_id),2)as d7_rate
 from
(select t1.artist_id,if(HOUR(t1.play_start)between 6 and 11,'morning',if(HOUR(t1.play_start)between 12 and 17,'afternoon','night')) as time_slot,t1.song_id,t1.release_time,t1.user_id from
(select a.artist_id,a.song_id,b.user_id,a.release_time,b.play_start from t_song a
join t_play b
on a.song_id=b.song_id
where DATE(b.play_start)=DATE(a.release_time)
and b.play_sec>= a.duration_sec)t1)t2
group by t2.artist_id,time_slot
having base_user_cnt > 0
order by d7_rate desc,base_user_cnt desc,artist_id asc,field(time_slot,'morning','afternoon','night')
发表于 2026-04-29 21:06:40 回复(0)
with a as(
select user_id,song_id,artist_id,date(release_time) as release_date,case
when hour(min(play_start)) between 6 and 11 then 'morning'
when hour(min(play_start)) between 12 and 17 then 'afternoon'
else 'night' end as time_slot
from t_play left join t_song using(song_id)
where (user_id,song_id) in 
(select user_id,song_id from t_play left join t_song using(song_id)
where date(release_time)=date(play_start) and play_sec>=duration_sec
group by user_id,song_id) 
and date(release_time)=date(play_start)
group by user_id,song_id,artist_id,release_date),
b as (
select a.user_id,a.song_id,release_date,a.artist_id,play_date,time_slot from a 
left join (select user_id,song_id,artist_id,date(play_start) as play_date from t_play left join t_song using(song_id)
where play_sec>=duration_sec)t
on t.artist_id=a.artist_id and t.user_id=a.user_id and play_date>release_date)
select artist_id,time_slot
,count(distinct concat(user_id,song_id)) as base_user_cnt
,round(sum(if(datediff(play_date,release_date)=1,1,0))/count(distinct concat(user_id,song_id)),2) as d1_rate
,round(sum(if(datediff(play_date,release_date)=3,1,0))/count(distinct concat(user_id,song_id)),2) as d3_rate
,round(sum(if(datediff(play_date,release_date)=7,1,0))/count(distinct concat(user_id,song_id)),2) as d7_rate
from b
group by artist_id,time_slot
having base_user_cnt>0
order by d7_rate desc,base_user_cnt desc,artist_id,field(time_slot,'morning','afternoon','night')

发表于 2026-04-28 18:54:55 回复(0)
WITH
#首发日观众完整听完的第一次记录 & 歌曲 & 歌手信息
base_user AS(
    SELECT
       t2.user_id,
       t1.artist_id,
       t1.song_id,
       t1.duration_sec,
       t1.release_time,
       #第一次听歌时间
       MIN(t2.play_start) first_play
    FROM t_song t1
        INNER JOIN t_play t2 ON t1.song_id = t2.song_id
        #必须要完整听完才行
        AND t1.duration_sec <= t2.play_sec
        #首发日记录
        AND DATE_FORMAT(t1.release_time, '%Y-%m-%d') = DATE_FORMAT(t2.play_start, '%Y-%m-%d')
    #按照artist,song,user分组,选出每个用户,每首歌,每个歌手的首次听歌时间
    GROUP BY t1.artist_id, t1.song_id, t2.user_id, t1.duration_sec, t1.release_time
),
#按对每一首歌,按用户首发听歌时间按时间段打标
user_time_slot AS(
    SELECT
        user_id, artist_id, song_id, duration_sec, release_time,
        CASE
            WHEN HOUR(first_play) BETWEEN 6 AND 11 THEN 'morning'
            WHEN HOUR(first_play) BETWEEN 12 AND 17 THEN 'afternoon'
            ELSE 'night'
        END time_slot
    FROM base_user
),
#统计用户在每个歌手(不是每首歌)的回流情况
user_back_cnt AS(
    SELECT
        t1.song_id,
        t1.artist_id,
        t1.user_id,
        #d1回流
        MAX(
            CASE WHEN
                #满足日期条件即可
                t2.play_start >= DATE_ADD(t1.release_time, INTERVAL 1 DAY)
                AND t2.play_start < DATE_ADD(t1.release_time, INTERVAL 2 DAY)
                #满足完整收听
                AND t2.play_sec >= t3.duration_sec
            THEN 1 ELSE 0 END
        ) d1,
        #d3回流
        MAX(
            CASE WHEN
                #满足日期条件即可
                t2.play_start >= DATE_ADD(t1.release_time, INTERVAL 3 DAY)
                AND t2.play_start < DATE_ADD(t1.release_time, INTERVAL 4 DAY)
                #满足完整收听
                AND t2.play_sec >= t3.duration_sec
            THEN 1 ELSE 0 END
        ) d3,
        #d7回流
        MAX(
            CASE WHEN
                #满足日期条件即可
                t2.play_start >= DATE_ADD(t1.release_time, INTERVAL 7 DAY)
                AND t2.play_start < DATE_ADD(t1.release_time, INTERVAL 8 DAY)
                #满足完整收听
                AND t2.play_sec >= t3.duration_sec
            THEN 1 ELSE 0 END
        ) d7
    FROM user_time_slot t1
        #只要是同一歌手就可以,不要求是同一首歌,所以t1不用song_id关联
        #我们通过t2的song_id获取到对应的artist_id,然后将t1的artist_id进行关联,表示同一歌手收听情况即可!
        INNER JOIN t_play t2
            ON t1.user_id = t2.user_id
        #歌曲信息,将artist关联
        INNER JOIN t_song t3
            ON t2.song_id = t3.song_id
            AND t1.artist_id = t3.artist_id
    GROUP BY t1.song_id, t1.artist_id, t1.user_id
)
#按歌手,时间段聚合统计
SELECT
    t1.artist_id,
    t1.time_slot,
    COUNT(*) base_user_cnt,
    ROUND(SUM(d1) / COUNT(*), 2) d1_rate,
    ROUND(SUM(d3) / COUNT(*), 2) d3_rate,
    ROUND(SUM(d7) / COUNT(*), 2) d7_rate
FROM user_time_slot t1
    INNER JOIN user_back_cnt t2 ON t1.user_id = t2.user_id AND t1.artist_id = t2.artist_id AND t1.song_id = t2.song_id
GROUP BY t1.artist_id, t1.time_slot
ORDER BY d7_rate DESC, base_user_cnt DESC, artist_id ASC, FIELD(t1.time_slot, 'morning', 'afternoon', 'night');
发表于 2026-04-28 11:41:47 回复(0)