下列為電商公司用戶訪問時間數據 | 統計某個用戶連續的訪問記錄,如果時間間隔小于60s,就分為一組 | |
id???????ts 1001?17523641234 1001?17523641256 1002?17523641278 1001?17523641334 1002?17523641434 1001?17523641534 1001?17523641544 1002?17523641634 1001?17523641638 1001?17523641654 | id???????ts???????????group 1001?17523641234???????1 1001?17523641256???????1 1001?17523641334???????2 1001?17523641534???????3 1001?17523641544???????3 1001?17523641638???????4 1001?17523641654???????4 1002?17523641278???????1 1002?17523641434???????2 1002?17523641634???????3 |
2.?需求:
①?創建表
②?統計每個用戶連續的訪問記錄中,如果時間間隔小于60s,就分為一個組(觀察數是標準的時間戳嗎?)
3.?數據準備:
編輯文件user_visit.txt
[atguigu@hadoop102 ~]$?vim /opt/module/hive/datas/user_visit.txt
1001?17523641234
1001?17523641256
1002?17523641278
1001?17523641334
1002?17523641434
1001?17523641534
1001?17523641544
1002?17523641634
1001?17523641638
1001?17523641654
select id, ts, sum(flag) over (partition by id order by ts) as id_group
from
(select id, ts, lag_ts, (ts - lag_ts) as diff, if((ts - lag_ts) < 60, 0, 1) as flag from(select id, ts,lag(ts, 1, 0) over (partition by id order by ts) as lag_tsfrom test2) tb1
)tb2