Hive的基本操作（查詢）

1、基礎查詢

基本語法

select 字段列表|表達式|子查詢
from 表(子查詢|視圖|臨時表|普通表)
where [not] 條件A and|or 條件B				--先：面向原始行進行篩選
group by 字段A[,字段B,...]					=> 分組【去重處理】
having 聚合條件(非原始字段條件)				--再：針對聚合后的字段進行二次篩選
order|sort|cluster by 字段A[,字段B,...]		--后：全局排序(非limit的最后一句)	走mapreduce
limit N(前N條記錄) | M(行號偏移量),N(記錄數)

1.where子句的條件格式

一：關系運算符

關系運算符：> , >= , < , <= , =【等值判斷】 , <>【不等于】

延伸：between (>=)SMALL_VALUE and (<=)BIG_VALUE; 【面向于數值或日期】

二：邏輯運算符

邏輯運算符：not【非】 , and【與】 , or【或】

延伸：

--if函數：
if(BOOLEN_EXPR,VALUE_IF_TRUE,VALUE_IF_FALSE_OR_NULL)案例：select user_id,`if`(order_amount < 1000,'low','high') as consumptionfrom test1wwhere user_gender = '女'limit 100;結果展示：user_id	consumption652,high376,high537,high280,high23,high--空值判斷：
1.nvl(VALUE_A,VALUE_B)	=>	VALUE_A為空值(null),則返回VALUE_B。否則返回VALUE_A
2.isnull(VAL)		=>	如果 VAL 為 null,則返回 1 。否則返回 0--case when函數：
case EXPR when V1 then VAL1 when V2 then VAL2 ... else VALN end <=> switch ... case
case when 條件1 then VAL1 when 條件2 then VAL2 ... else VALN end <=> if ... else if ...案例：select user_id,case when order_amount<1000 then '低消費人群' when order_amount<5000 then '中等消費人群' else '高消費人群' end as levelfrom test1wwhere user_gender = '女'limit 100;結果展示：user_id	level652,高消費人群376,高消費人群537,低消費人群280,中等消費人群...

三：通配符

模糊查詢：

基本語法：like '% | _'	【模糊匹配】講解：% => 任意個任意符號_ => 一個任意符號案例：select "張無極" like '張%';		=> trueselect "張無極" like '張_';		=> false

正則匹配：

基本語法：rlike '正則表達式'如：'^//d+$'案例：select "like" rlike '^[a-zA-Z]{2,4}$';	  =>true

2.排序

1、order by 表達式[field|func|case...when...]    		---【全局排序】：性能差優化：在order by B 之前，可以先對數據進行 distribute by A 與 sort by B=> 先部分排序，后全局排序2、sort by FIELD_N 								  --在【每一個reducer端】排序解釋：當reducer 的數量為1時，等同于 order byFIELD_N 必須是select字段列表中的一員一般和 distribute by 配合使用3、cluster by 		--cluster by 字段A = distribute by 字段A + sort by 字段A

3.分組

1、group by 表達式(field|func|case...when) 	--為了聚合而分組，否則類似去重(代替distinct)目的：按照某些條件對數據進行分組并進行聚合操作，使用 group by多分組：1.group by A,B,C 		grouping sets(B,(A,C),(B,C))	?  --指定多個【分組】為：B,(A,C),(B,C)2.group by cube(A,B,C) 		--排列組合后的所有分組：A,B,C,(A,B),(A,C),(B,C),(A,B,C)3.group by rollup(A,B,C)	--最左原則的所有分組：A,(A,B),(A,B,C)2、distribute by 表達式(field|func|case...when)目的：為了將數據分區，僅僅將數據分發到多個節點上并行處理，使用 distribute by解釋：1.不改變原始行數2.類似于 hadoop job 中的 Partitioner。 【默認是采用hash算法】3.指定按哪個字段的hashcode分區，配合【預先設置reducer數量】注意：distribute by【決定進哪個reducer】與sort by【在reducer中排序】一般搭配使用的distribute by通常使用在SORT BY語句之前

小型案例：

with product_total as ( select order_item_product_id product_id,sum(order_item_subtotal) totalfrom cb_order_itemsgroup by order_item_product_id
)
select product_id,total
from product_total
distribute by product_id
sort by total desc;

多分組案例

1.grouping sets 案例：?create temporary table tmp_cb_order_ymbsc_sets asselect year,month,dept_id,cate_id,prod_idgrouping__id,sum(quantity) as quantity,round(sum(amount)) as amountfrom tmp_cb_order_ymbscgroup by year,month,dept_id,cate_id,prod_idgrouping sets(prod_id,(dept_id,cate_id),(year,month),(year,month,prod_id))order by grouping__id;-------------------------------------尋找哪幾組【去重】：select grouping__idfrom tmp_cb_order_ymbsc_setsgroup by grouping__id;--------------------------------------- grouping__id:6 :	year,month,prod_id7 :	year,month25 : dept_id,cate_id 	30 : prod_id2.cube 案例：【不常用】selectyear(order_date) as year,month(order_date) as month,day(order_date) as day,count(*) as count,grouping__idfrom cb_ordersgroup by cube (year(order_date),month(order_date),day(order_date))order by grouping__id;3.rollup 案例：【不常用】selectyear(order_date) as year,month(order_date) as month,day(order_date) as day,count(*) as count,grouping__idfrom cb_ordersgroup by rollup (year(order_date),month(order_date),day(order_date))order by grouping__id;

2、子查詢

基本語法

select 			可以出現子查詢(查某個字段值,與主查詢存在邏輯主外鍵關系)
from 			可以出現子查詢(數據表的子集 select F1,...,FN from T where ... group by ...)
where 			可以出現子查詢(FIELD in|=|>= (select ONLY_ONE_FIELD_IN ...))
group by FIELD|substr(FIELD,0,4),...
having 			可以出現子查詢(FIELD in|=|>= (select ONLY_ONE_FIELD_IN ...))
order by FIELD|substr(FIELD,0,4),...

常用語法【from子查詢】

select 字段列表|表達式|子查詢
from(select 字段列表|表達式|子查詢					 ---先進行內部的查詢from TABLEwhere [not] 條件A and|or 條件B...
)												---后進行外部的查詢
where [not] 條件A and|or 條件B						--后=>先：面向原始行進行篩選
group by 字段A[,字段B,...]	
order by 字段A[,字段B,...]							--后=>再：針對聚合后的字段進行二次篩選
limit N(前N條記錄) | M(行號偏移量),N(記錄數)		--后=>后：全局排序(非limit的最后一句)

3、CTE

基本語法

with 
SUB_ALIA as(...),
SUB_ALTER as(select...from SUB_ALIA...)
select...

小型案例

withtotal_amount as(select sum(order_amount) totalfrom hive_internal_par_regex_test1wwhere year>=2016group by user_gender, user_idhaving total>=20000),level_amount as(select round(total/10000) as levelfrom total_amount)
select level,count(*) as level_count
from level_amount
group by level;結果展示：level level_count2,1623,1254,265,5

4、聯合查詢

數據準備

Class表：
+-------+---------+
|classId|className|
+-------+---------+
|      1|  yb12211|
|      2|  yb12309|
|      3|  yb12401|
+-------+---------+Student表：
+-----+-------+
| name|classId|
+-----+-------+
|henry|      1|
|ariel|      2|
| jack|      1|
| rose|      4|
|jerry|      2|
| mary|      1|
+-----+-------+

三種主要形式

一：內連接【inner join】

兩集合取交集：

select A.內容,....,B.內容,...                              =>字段別名：提高篩選的性能
from TABLE_A as A												
inner join TABLE_B as B
on A.主鍵=B.外鍵 (and A.fa = VALUE...)  多表√ 兩表√        	=>表進行合并時進行【連接條件】
where A.fa = VALUE;                     兩表√             =>合并后進行【條件篩選】
group by ...
having ...
order by ...
limit ...

小型案例：

select * from Student S
inner join Class C
on S.classId = C.classId結果展示：+-----+-------+-------+---------+| name|classId|classId|className|+-----+-------+-------+---------+|henry|      1|      1|  yb12211||ariel|      2|      2|  yb12309|| jack|      1|      1|  yb12211||jerry|      2|      2|  yb12309|| mary|      1|      1|  yb12211|+-----+-------+-------+---------+

二：外連接

左外連接【left join】

兩個集合取左全集，右交集

select A.內容,....,B.內容,...                              	     =>字段別名：提高篩選的性能
from TABLE_A as A                 									【A為主表】
left [outer] join TABLE_B as B		    							【B為從表】
on A.主鍵|外鍵=B.外鍵|主鍵 (and A.fa = VALUE...)    多表√ 兩表√     =>表進行合并時進行【連接條件】
where A.fa = VALUE;                                 兩表√        =>合并后進行【條件篩選】
group by ...
having ...
order by ...
limit ...

小型案例：

select * from Student S
left join Class C
on S.classId = C.classId結果展示：+-----+-------+-------+---------+| name|classId|classId|className|+-----+-------+-------+---------+|henry|      1|      1|  yb12211||ariel|      2|      2|  yb12309|| jack|      1|      1|  yb12211|| rose|      4|   null|     null||jerry|      2|      2|  yb12309|| mary|      1|      1|  yb12211|+-----+-------+-------+---------+

右外連接【right join】

兩集合取右全集，左交集

select A.內容,....,B.內容,...                              		=>字段別名：提高篩選的性能
from TABLE_A as A                 										【A為主表】
right [outer] join TABLE_B as B		    								【B為從表】
on A.主鍵|外鍵=B.外鍵|主鍵 (and A.fa = VALUE;)    多表√ 兩表√      =>表進行合并時進行【連接條件】
where A.fa = VALUE;                               兩表√         =>合并后進行【條件篩選】
group by ...
having ...
order by ...
limit ...

小型案例：

select * from Student S
right join Class C
on S.classId = C.classId結果展示：+-----+-------+-------+---------+| name|classId|classId|className|+-----+-------+-------+---------+| mary|      1|      1|  yb12211|| jack|      1|      1|  yb12211||henry|      1|      1|  yb12211||jerry|      2|      2|  yb12309||ariel|      2|      2|  yb12309|| null|   null|      3|  yb12401|+-----+-------+-------+---------+

全外連接【full join】

兩集合取左右全集

select A.內容,....,B.內容,...                              		 =>字段別名：提高篩選的性能
from TABLE_A as A                 										【A為主表】
full [outer] join TABLE_B as B		    								【B為從表】
on A.主鍵|外鍵=B.外鍵|主鍵 (and A.fa = VALUE;)    多表√ 兩表√       =>表進行合并時進行【連接條件】
where A.fa = VALUE;                               兩表√          =>合并后進行【條件篩選】
group by ...
having ...
order by ...
limit ...

小型案例：

select * from Student S
full join Class C
on S.classId = C.classId結果展示：+-----+-------+-------+---------+| name|classId|classId|className|+-----+-------+-------+---------+|henry|      1|      1|  yb12211|| jack|      1|      1|  yb12211|| mary|      1|      1|  yb12211|| null|   null|      3|  yb12401|| rose|      4|   null|     null||ariel|      2|      2|  yb12309||jerry|      2|      2|  yb12309|+-----+-------+-------+---------+

三：交叉連接【cross join】

兩集合取笛卡爾積

select A.內容,....,B.內容,...                              		 =>字段別名：提高篩選的性能
from TABLE_A as A                 										【A為主表】
cross join TABLE_B as B		    										【B為從表】
on A.主鍵|外鍵=B.外鍵|主鍵 (and A.fa = VALUE;)    多表√ 兩表√       =>表進行合并時進行【連接條件】
where A.fa = VALUE;                               兩表√          =>合并后進行【條件篩選】
group by ...
having ...
order by ...
limit ...

小型案例：

select * from Student S
cross join Class C
on S.classId = C.classId結果展示：+-----+-------+-------+---------+| name|classId|classId|className|+-----+-------+-------+---------+|henry|      1|      1|  yb12211||henry|      1|      2|  yb12309||henry|      1|      3|  yb12401||ariel|      2|      1|  yb12211||ariel|      2|      2|  yb12309||ariel|      2|      3|  yb12401|| jack|      1|      1|  yb12211|| jack|      1|      2|  yb12309|| jack|      1|      3|  yb12401|| rose|      4|      1|  yb12211|| rose|      4|      2|  yb12309|| rose|      4|      3|  yb12401||jerry|      2|      1|  yb12211||jerry|      2|      2|  yb12309||jerry|      2|      3|  yb12401|| mary|      1|      1|  yb12211|| mary|      1|      2|  yb12309|| mary|      1|      3|  yb12401|+-----+-------+-------+---------+

5、聯合查詢

何為聯合查詢？

縱向拼接表，高變大
查詢字段的【數量】與【類型】必須相同，字段名是以【第一張表為準】。

union與union all的區分

union：合并后刪除重復項（去重）
union all：合并后保留重復項 ?

小型案例

數據準備：

語句：

select age,job from bank_client_info_3
union all
select age,job from bank_client_info_3;

Hive的基本操作（查詢）

1、基礎查詢

基本語法

1.where子句的條件格式

一：關系運算符

二：邏輯運算符

三：通配符

2.排序

3.分組

多分組案例

2、子查詢

基本語法

常用語法【from子查詢】

3、CTE

基本語法

小型案例

4、聯合查詢

數據準備

三種主要形式

一：內連接【inner join】

二：外連接

左外連接【left join】

右外連接【right join】

全外連接【full join】

三：交叉連接【cross join】

5、聯合查詢

何為聯合查詢？

union與union all的區分

小型案例

相關文章