間隔連續問題
1.?數據結構:某游戲公司記錄的用戶每日登錄數據
????表名:game_user
????字段名:id(用戶id)、dt(日期)
2.?需求:
①?創建表
②?計算每個用戶最大的連續登錄天數,可以間隔一天。如:如果一個用戶在1,3,5,6登錄游戲,則視為連續6天登錄。
3.?數據準備:
創建文件game_user.txt
[atguigu@hadoop102 ~]$?vim /opt/module/hive/datas/game_user.txt
1001?2022-05-01 23:21:33
1003?2022-05-02 23:21:33
1002?2022-05-01 23:21:33
1003?2022-05-01 23:21:33
1001?2022-05-03 23:21:33
1003?2022-05-04 23:21:33
1002?2022-05-01 23:21:33
1001?2022-05-05 23:21:33
1001?2022-05-01 23:21:33
1002?2022-05-06 23:21:33
1001?2022-05-06 23:21:33
1001?2022-05-07 23:21:33
4.?答案:
①?創建表
hive(default)>
create table game_user(
????????id??bigint,
????????dt??string
)
row format delimited?
fields terminated by?'\t';
②?計算每個用戶最大的連續登錄天數,可以間隔一天
select id, datediff(max_dt, min_dt)
from
(
select id, max(dt) as max_dt, min(lag_dt) as min_dt
from
(select id, dt, lag_dt, datediff(dt, lag_dt) as date_diff, if(datediff(dt, lag_dt) > 2, 1, 0) as flag from(select id, dt, lag(dt, 1, '1970-01-01') over (partition by id order by dt asc) as lag_dtfrom (select id, date_format(dt, 'yyyy-MM-dd') as dtfrom game_user) tb1) tb2
)tb3where flag = 0
group by id
) tb4;