窗口函數
https://www.cnblogs.com/Uni-Hoang/p/17411313.html
<窗口函數> OVER ([PARTITION BY <分組列> [, <分組列>...]][ORDER BY <排序列> [ASC | DESC] [, <排序列> [ASC | DESC]]...][<rows or range clause>])
)
窗口函數主要是三類:
- 聚合函數,如sum、max、avg、min 等等
- 排序函數,如row_number、rank、dense_rank
- 分析函數,如lead、lag、first_value、last_VALUE
窗口邊界的使用詳解
preceding 往前following 往后current row 當前行unbounded 起點unbounded preceding 從前面的起點unbounded following 到后面的終點
-
如果不指定 PARTITION BY,則不對數據進行分區,換句話說,所有數據看作同一個分區;
-
如果不指定 ORDER BY,則不對各分區做排序,通常用于那些順序無關的窗口函數,例如 SUM()
-
如果不指定窗口子句,則默認采用以下的窗口定義:
a、若不指定 ORDER BY,默認使用分區內所有行 ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING.
b、若指定了 ORDER BY,默認使用分區內第一行到當前值 ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.
explode()函數和posexplode()函數
- explode就是將hive一行中復雜的array或者map結構拆分成多行
- posexplode 特點是不僅炸裂出數值,還附帶索引,實現多列進行多行轉換;
hive>with temp as( select 'a,b,c,d' as examp_data1,'2:00,3:00,4:00,5:00' as examp_data2)select
data1
,data2
from temp
lateral view explode(split(examp_data1,',')) view1 as data1
lateral view explode(split(examp_data2,',')) view1 as data2
data1,data2
a,2:00
a,3:00
a,4:00
a,5:00
b,2:00
b,3:00
b,4:00
b,5:00
c,2:00
c,3:00
c,4:00
c,5:00
d,2:00
d,3:00
d,4:00
d,5:00
with temp as( select 'a,b,c,d' as examp_data1,'2:00,3:00,4:00,5:00' as examp_data2)select
data1
,data2
from temp
lateral view posexplode(split(examp_data1,',')) view1 as index1,data1
lateral view posexplode(split(examp_data2,',')) view1 as index2,data2
where index1=index2
data1,data2
a,2:00
b,3:00
c,4:00
d,5:00
lateral view 和 lateral view outer 的區別
lateral view explode 炸裂的數組中如果存在null,則這一條數據舍棄;lateral view ourer explode 炸裂的數組中如果存在null,則這一條數據保留,炸裂的字段值填充為null ;
日期函數
https://blog.csdn.net/u013421629/article/details/80450047