留存概念:
次X日活躍留存,次X日新增留存,也就是看今天的新增或活躍用戶在后續幾天的留存情況
一、留存表的生成邏輯
因為用戶活躍日期和留存的日期無法對齊所以搞了2級分區(dt,static_day)
1)首先獲得計算日D、根據要出的次X日留存,推算出前面的DT ,整體從活躍表里根據這些日期生成臨時活躍表tmp1
2)分別把計算DT和前X日的DT進行匹配,按相差的天數進行匹配,如果匹配一直分別得到對應的次X日留存標識。
3)需要使用1個新的字段存儲留存指標的的日期,比如20250701號的留存keep1_num只能等20250702號過完才能計算,那對應也是7.1號算留存日期,是指在DT=20250702的留存時間。
所以根據dt往前推算的日期都是留存日期,不能寫到dt這個字段里,因為除了留存指標外還要計算統計日的指標。
如果留存日期=統計日期的,出的當日活躍。留存日期< 統計日期的話,出的是次X日留存指標。
--活躍臨時表
create table tmp1 as
select ,t1.uuid,t1.dt as statis_day,case when t1.dt='${DT}' then 'Y' else 'N' end as keep_0d_active_flag,case when t1.dt=regexp_replace(date_add(from_unixtime(to_unix_timestamp('${DT}', 'yyyyMMdd')),-1), '-', '') then 'Y'else 'N' end as keep_1d_active_flag,case when t1.dt=regexp_replace(date_add(from_unixtime(to_unix_timestamp('${DT}', 'yyyyMMdd')),-2), '-', '') then 'Y'else 'N' end as keep_2d_active_flag ,case when t1.dt=regexp_replace(date_add(from_unixtime(to_unix_timestamp('${DT}', 'yyyyMMdd')),-3), '-', '') then 'Y'else 'N' end as keep_3d_active_flag ,case when t1.dt=regexp_replace(date_add(from_unixtime(to_unix_timestamp('${DT}', 'yyyyMMdd')),-4), '-', '') then 'Y'else 'N' end as keep_4d_active_flag ,case when t1.dt=regexp_replace(date_add(from_unixtime(to_unix_timestamp('${DT}', 'yyyyMMdd')),-5), '-', '') then 'Y'else 'N' end as keep_5d_active_flag ,case when t1.dt=regexp_replace(date_add(from_unixtime(to_unix_timestamp('${DT}', 'yyyyMMdd')),-6), '-', '') then 'Y'else 'N' end as keep_6d_active_flag ,case when t1.dt=regexp_replace(date_add(from_unixtime(to_unix_timestamp('${DT}', 'yyyyMMdd')),-7), '-', '') then 'Y'else 'N' end as keep_7d_active_flag from 活躍表 t1
where t1.dt in (
${DT}
,regexp_replace(date_add(from_unixtime(to_unix_timestamp('${DT}', 'yyyyMMdd')),-1), '-', '')
,regexp_replace(date_add(from_unixtime(to_unix_timestamp('${DT}', 'yyyyMMdd')),-2), '-', '')
,regexp_replace(date_add(from_unixtime(to_unix_timestamp('${DT}', 'yyyyMMdd')),-3), '-', '')
,regexp_replace(date_add(from_unixtime(to_unix_timestamp('${DT}', 'yyyyMMdd')),-4), '-', '')
,regexp_replace(date_add(from_unixtime(to_unix_timestamp('${DT}', 'yyyyMMdd')),-5), '-', '')
,regexp_replace(date_add(from_unixtime(to_unix_timestamp('${DT}', 'yyyyMMdd')),-6), '-', '')
,regexp_replace(date_add(from_unixtime(to_unix_timestamp('${DT}', 'yyyyMMdd')),-7), '-', '')
);--當日活躍以及留存指標
insert overwrite table 留存表 partition(dt='${DT}')
select group_id,statis_day,channel,version,sum(case when keep_0d_active_flag='Y' then 1 else 0 end) as av,sum(case when keep_1d_active_flag='Y' then 1 else 0 end) as keep_1d_av,sum(case when keep_2d_active_flag='Y' then 1 else 0 end) as keep_2d_av,sum(case when keep_3d_active_flag='Y' then 1 else 0 end) as keep_3d_av,sum(case when keep_4d_active_flag='Y' then 1 else 0 end) as keep_4d_av,sum(case when keep_5d_active_flag='Y' then 1 else 0 end) as keep_5d_av,sum(case when keep_6d_active_flag='Y' then 1 else 0 end) as keep_6d_av,sum(case when keep_7d_active_flag='Y' then 1 else 0 end) as keep_7d_av
from(select cast(grouping__id as bigint)& 7 & 3 as group_id,channel,uuid,statis_day,max(keep_1d_active_flag) as keep_1d_active_flag,max(keep_2d_active_flag) as keep_2d_active_flag,max(keep_3d_active_flag) as keep_3d_active_flag,max(keep_4d_active_flag) as keep_4d_active_flag,max(keep_5d_active_flag) as keep_5d_active_flag,max(keep_6d_active_flag) as keep_6d_active_flag,max(keep_7d_active_flag) as keep_7d_active_flag from tmp1group by ,channel --1,version --2 ,uuid -- 4,statis_day --8 grouping sets( (channel,uuid,statis_day) ,(version,uuid,statis_day),(uuid,statis_day) )
) ta
group by group_id,statis_day,channel,version
二、對于留存的表的查詢處理
1)非留存指標的話,直接使用where dt between ‘20250701’ and ‘20250707’
2)對于留存指標要取static_day,這個static_day是代表留存日期在dt的不同留存指標。
select
dt
,sum(active_num)
,sum(keep1_num)
,sum(keep2_num)
,sum(keep3_num)
,sum(keep4_num)
from
(select
dt,
,active_num
,0 as keep1_num
,0 as keep2_num
,0 as keep3_num
,0 as keep4_num
from 留存表 where dt between ‘20250701’ and ‘20250704’
union all
select
static_day dt,
,0 as active_num
,keep1_num
,keep2_num
,keep3_num
,keep4_num
from 留存表 where static_day between ‘20250701’ and ‘20250704’
) t group by dt