1、基礎查詢
基本語法
select 字段列表|表達式|子查詢
from 表(子查詢|視圖|臨時表|普通表)
where [not] 條件A and|or 條件B --先:面向原始行進行篩選
group by 字段A[,字段B,...] => 分組【去重處理】
having 聚合條件(非原始字段條件) --再:針對聚合后的字段進行二次篩選
order|sort|cluster by 字段A[,字段B,...] --后:全局排序(非limit的最后一句) 走mapreduce
limit N(前N條記錄) | M(行號偏移量),N(記錄數)
1.where子句的條件格式
一:關系運算符
關系運算符:> , >= , < , <= , =【等值判斷】 , <>【不等于】
- 延伸:between (>=)SMALL_VALUE and (<=)BIG_VALUE; 【面向于 數值或日期】
二:邏輯運算符
邏輯運算符:not【非】 , and【與】 , or【或】
- 延伸:
--if函數:
if(BOOLEN_EXPR,VALUE_IF_TRUE,VALUE_IF_FALSE_OR_NULL)案例:select user_id,`if`(order_amount < 1000,'low','high') as consumptionfrom test1wwhere user_gender = '女'limit 100;結果展示:user_id consumption652,high376,high537,high280,high23,high--空值判斷:
1.nvl(VALUE_A,VALUE_B) => VALUE_A為空值(null),則返回VALUE_B。否則返回VALUE_A
2.isnull(VAL) => 如果 VAL 為 null,則返回 1 。否則返回 0--case when函數:
case EXPR when V1 then VAL1 when V2 then VAL2 ... else VALN end <=> switch ... case
case when 條件1 then VAL1 when 條件2 then VAL2 ... else VALN end <=> if ... else if ...案例:select user_id,case when order_amount<1000 then '低消費人群' when order_amount<5000 then '中等消費人群' else '高消費人群' end as levelfrom test1wwhere user_gender = '女'limit 100;結果展示:user_id level652,高消費人群376,高消費人群537,低消費人群280,中等消費人群...
三:通配符
模糊查詢:
基本語法:like '% | _' 【模糊匹配】講解:% => 任意個任意符號_ => 一個任意符號案例:select "張無極" like '張%'; => trueselect "張無極" like '張_'; => false
正則匹配:
基本語法:rlike '正則表達式'如:'^//d+$'案例:select "like" rlike '^[a-zA-Z]{2,4}$'; =>true
2.排序
1、order by 表達式[field|func|case...when...] ---【全局排序】:性能差優化:在order by B 之前,可以先對數據進行 distribute by A 與 sort by B=> 先部分排序,后全局排序2、sort by FIELD_N --在【每一個reducer端】排序解釋:當reducer 的數量為1時,等同于 order byFIELD_N 必須是select字段列表中的一員一般和 distribute by 配合使用3、cluster by --cluster by 字段A = distribute by 字段A + sort by 字段A
3.分組
1、group by 表達式(field|func|case...when) --為了聚合而分組,否則類似去重(代替distinct)目的:按照某些條件對數據進行分組并進行聚合操作,使用 group by多分組:1.group by A,B,C grouping sets(B,(A,C),(B,C)) ? --指定多個【分組】為:B,(A,C),(B,C)2.group by cube(A,B,C) --排列組合后的所有分組:A,B,C,(A,B),(A,C),(B,C),(A,B,C)3.group by rollup(A,B,C) --最左原則的所有分組:A,(A,B),(A,B,C)2、distribute by 表達式(field|func|case...when)目的:為了將數據分區,僅僅將數據分發到多個節點上并行處理,使用 distribute by解釋:1.不改變原始行數2.類似于 hadoop job 中的 Partitioner。 【默認是采用hash算法】3.指定按哪個字段的hashcode分區,配合【預先設置reducer數量】注意:distribute by【決定進哪個reducer】與sort by【在reducer中排序】一般搭配使用的distribute by通常使用在SORT BY語句之前
小型案例:
with product_total as ( select order_item_product_id product_id,sum(order_item_subtotal) totalfrom cb_order_itemsgroup by order_item_product_id
)
select product_id,total
from product_total
distribute by product_id
sort by total desc;
多分組案例
1.grouping sets 案例:?create temporary table tmp_cb_order_ymbsc_sets asselect year,month,dept_id,cate_id,prod_idgrouping__id,sum(quantity) as quantity,round(sum(amount)) as amountfrom tmp_cb_order_ymbscgroup by year,month,dept_id,cate_id,prod_idgrouping sets(prod_id,(dept_id,cate_id),(year,month),(year,month,prod_id))order by grouping__id;-------------------------------------尋找哪幾組【去重】:select grouping__idfrom tmp_cb_order_ymbsc_setsgroup by grouping__id;--------------------------------------- grouping__id:6 : year,month,prod_id7 : year,month25 : dept_id,cate_id 30 : prod_id2.cube 案例:【不常用】selectyear(order_date) as year,month(order_date) as month,day(order_date) as day,count(*) as count,grouping__idfrom cb_ordersgroup by cube (year(order_date),month(order_date),day(order_date))order by grouping__id;3.rollup 案例:【不常用】selectyear(order_date) as year,month(order_date) as month,day(order_date) as day,count(*) as count,grouping__idfrom cb_ordersgroup by rollup (year(order_date),month(order_date),day(order_date))order by grouping__id;
2、子查詢
基本語法
select 可以出現子查詢(查某個字段值,與主查詢存在邏輯主外鍵關系)
from 可以出現子查詢(數據表的子集 select F1,...,FN from T where ... group by ...)
where 可以出現子查詢(FIELD in|=|>= (select ONLY_ONE_FIELD_IN ...))
group by FIELD|substr(FIELD,0,4),...
having 可以出現子查詢(FIELD in|=|>= (select ONLY_ONE_FIELD_IN ...))
order by FIELD|substr(FIELD,0,4),...
常用語法【from子查詢】
select 字段列表|表達式|子查詢
from(select 字段列表|表達式|子查詢 ---先進行內部的查詢from TABLEwhere [not] 條件A and|or 條件B...
) ---后進行外部的查詢
where [not] 條件A and|or 條件B --后=>先:面向原始行進行篩選
group by 字段A[,字段B,...]
order by 字段A[,字段B,...] --后=>再:針對聚合后的字段進行二次篩選
limit N(前N條記錄) | M(行號偏移量),N(記錄數) --后=>后:全局排序(非limit的最后一句)
3、CTE
基本語法
with
SUB_ALIA as(...),
SUB_ALTER as(select...from SUB_ALIA...)
select...
小型案例
withtotal_amount as(select sum(order_amount) totalfrom hive_internal_par_regex_test1wwhere year>=2016group by user_gender, user_idhaving total>=20000),level_amount as(select round(total/10000) as levelfrom total_amount)
select level,count(*) as level_count
from level_amount
group by level;結果展示:level level_count2,1623,1254,265,5
4、聯合查詢
數據準備
Class表:
+-------+---------+
|classId|className|
+-------+---------+
| 1| yb12211|
| 2| yb12309|
| 3| yb12401|
+-------+---------+Student表:
+-----+-------+
| name|classId|
+-----+-------+
|henry| 1|
|ariel| 2|
| jack| 1|
| rose| 4|
|jerry| 2|
| mary| 1|
+-----+-------+
三種主要形式
一:內連接【inner join】
兩集合取交集:
select A.內容,....,B.內容,... =>字段別名:提高篩選的性能
from TABLE_A as A
inner join TABLE_B as B
on A.主鍵=B.外鍵 (and A.fa = VALUE...) 多表√ 兩表√ =>表進行合并時進行【連接條件】
where A.fa = VALUE; 兩表√ =>合并后進行【條件篩選】
group by ...
having ...
order by ...
limit ...
小型案例:
select * from Student S
inner join Class C
on S.classId = C.classId結果展示:+-----+-------+-------+---------+| name|classId|classId|className|+-----+-------+-------+---------+|henry| 1| 1| yb12211||ariel| 2| 2| yb12309|| jack| 1| 1| yb12211||jerry| 2| 2| yb12309|| mary| 1| 1| yb12211|+-----+-------+-------+---------+
二:外連接
左外連接【left join】
兩個集合取左全集,右交集
select A.內容,....,B.內容,... =>字段別名:提高篩選的性能
from TABLE_A as A 【A為主表】
left [outer] join TABLE_B as B 【B為從表】
on A.主鍵|外鍵=B.外鍵|主鍵 (and A.fa = VALUE...) 多表√ 兩表√ =>表進行合并時進行【連接條件】
where A.fa = VALUE; 兩表√ =>合并后進行【條件篩選】
group by ...
having ...
order by ...
limit ...
小型案例:
select * from Student S
left join Class C
on S.classId = C.classId結果展示:+-----+-------+-------+---------+| name|classId|classId|className|+-----+-------+-------+---------+|henry| 1| 1| yb12211||ariel| 2| 2| yb12309|| jack| 1| 1| yb12211|| rose| 4| null| null||jerry| 2| 2| yb12309|| mary| 1| 1| yb12211|+-----+-------+-------+---------+
右外連接【right join】
兩集合取右全集,左交集
select A.內容,....,B.內容,... =>字段別名:提高篩選的性能
from TABLE_A as A 【A為主表】
right [outer] join TABLE_B as B 【B為從表】
on A.主鍵|外鍵=B.外鍵|主鍵 (and A.fa = VALUE;) 多表√ 兩表√ =>表進行合并時進行【連接條件】
where A.fa = VALUE; 兩表√ =>合并后進行【條件篩選】
group by ...
having ...
order by ...
limit ...
小型案例:
select * from Student S
right join Class C
on S.classId = C.classId結果展示:+-----+-------+-------+---------+| name|classId|classId|className|+-----+-------+-------+---------+| mary| 1| 1| yb12211|| jack| 1| 1| yb12211||henry| 1| 1| yb12211||jerry| 2| 2| yb12309||ariel| 2| 2| yb12309|| null| null| 3| yb12401|+-----+-------+-------+---------+
全外連接【full join】
兩集合取左右全集
select A.內容,....,B.內容,... =>字段別名:提高篩選的性能
from TABLE_A as A 【A為主表】
full [outer] join TABLE_B as B 【B為從表】
on A.主鍵|外鍵=B.外鍵|主鍵 (and A.fa = VALUE;) 多表√ 兩表√ =>表進行合并時進行【連接條件】
where A.fa = VALUE; 兩表√ =>合并后進行【條件篩選】
group by ...
having ...
order by ...
limit ...
小型案例:
select * from Student S
full join Class C
on S.classId = C.classId結果展示:+-----+-------+-------+---------+| name|classId|classId|className|+-----+-------+-------+---------+|henry| 1| 1| yb12211|| jack| 1| 1| yb12211|| mary| 1| 1| yb12211|| null| null| 3| yb12401|| rose| 4| null| null||ariel| 2| 2| yb12309||jerry| 2| 2| yb12309|+-----+-------+-------+---------+
三:交叉連接【cross join】
兩集合取笛卡爾積
select A.內容,....,B.內容,... =>字段別名:提高篩選的性能
from TABLE_A as A 【A為主表】
cross join TABLE_B as B 【B為從表】
on A.主鍵|外鍵=B.外鍵|主鍵 (and A.fa = VALUE;) 多表√ 兩表√ =>表進行合并時進行【連接條件】
where A.fa = VALUE; 兩表√ =>合并后進行【條件篩選】
group by ...
having ...
order by ...
limit ...
小型案例:
select * from Student S
cross join Class C
on S.classId = C.classId結果展示:+-----+-------+-------+---------+| name|classId|classId|className|+-----+-------+-------+---------+|henry| 1| 1| yb12211||henry| 1| 2| yb12309||henry| 1| 3| yb12401||ariel| 2| 1| yb12211||ariel| 2| 2| yb12309||ariel| 2| 3| yb12401|| jack| 1| 1| yb12211|| jack| 1| 2| yb12309|| jack| 1| 3| yb12401|| rose| 4| 1| yb12211|| rose| 4| 2| yb12309|| rose| 4| 3| yb12401||jerry| 2| 1| yb12211||jerry| 2| 2| yb12309||jerry| 2| 3| yb12401|| mary| 1| 1| yb12211|| mary| 1| 2| yb12309|| mary| 1| 3| yb12401|+-----+-------+-------+---------+
5、聯合查詢
何為聯合查詢?
-
縱向拼接表,高變大
-
查詢字段的【數量】與【類型】必須相同,字段名是以【第一張表為準】。
union與union all的區分
-
union:合并后刪除重復項(去重)
-
union all:合并后保留重復項 ?
小型案例
數據準備:
語句:
select age,job from bank_client_info_3
union all
select age,job from bank_client_info_3;