HIVE學習（hive基礎）

HIVE基礎介紹

一、HIVE簡介
二、hive的數據類型
- 1、基本數據類型
- 2、復合數據類型
三、HIVE的DDL操作
四、創建一個表
- 1. 建表語句
五、修改表結構
- 1.修改表名
- 2. 列修改或增加
- 3. 修改分區
五、常見函數
六、一對一關聯
- left join左關聯
- right join 右關聯
- 內連接
- 全連接
- 查詢只有A表的數據

一、HIVE簡介

什么是hive？
HIVE是一種基于hadoop的一個數據倉庫工具，可以將結構化的數據文件映射為一張數據庫表，并提供SQL查詢功能
本質：SQL轉換為MAPREDUCE程序
主要用途：用來做離線數據分析，比直接用MapReduce開發效率更高
hive與數據庫的區別
① 查詢語句
HQLSQL
② 數據存儲
HDFS // RAW DEVICE OR LOCAL FS
③ 執行器
MAPREDUCE / EXECUTOR
④ 數據插入
支持批量導入與單挑插入//支持單條或批量導入
⑤ 數據操作
覆蓋追加//行級更新刪除
⑥數據規模
大/小
⑦執行延遲
高/低
3.HIVE不支持數據更新與刪除是因為hive存儲在HDFS中，刪除為物理刪除，代價較高，只支持覆蓋和追加
hive擴展性好是因為可以在多個集群的服務器上做應用開發
hive的讀時模塊快，是指hive加載數據到表中時不會做數據校驗，在讀取數據時才校驗，它的查詢延遲主要浪費在資源調度上，進行任務劃分然后進行計算任務的申請

二、hive的數據類型

1、基本數據類型

boolean	true/false	true
tinyint	1字節的有符號整數	1
smallint	2字節的有符號整數	1
int	4字節的有符號整數	1
bigint	8字節的有符號整數	1
float	4字節單精度浮點數	1.0
double	8字節單精度浮點數	1.0
string	字符串	“abc”
varchar	字符串	“abc”
timestamp	時間戳	1563157873
date	日期	20190715

2、復合數據類型

類型名稱	描述	舉例
array	字段類型相同的有序字段	array(1,2,3)
map	無序的鍵值對map(k1,v2,k2,v2)	map(‘a’,‘1’,‘b’,‘2’)
struct	一組命名的字段，字段類型可以不同struct(元素1，元素2)	struct(‘a’,1,2,0)

select map_key(''),map_values('') from user;

create table complex(col1 array<int>,col2 map<string,int>,col3 struct<a:string,b:int,c:int>
)

三、HIVE的DDL操作

show database；
show database like 'db_hive*';
# 顯示數據庫詳細信息
desc database extended db_hive;
# 切換當前數據庫
use db_hive;
# 刪除數據庫
drop database if exists db_hive;
# 強制刪除
drop database if exists bd_hive cascade；

四、創建一個表

1. 建表語句

create（external） table （if not exists） table_name（col_name data_type comment "中文名"
）
row format delimited fields terminated by'\t'[指定每一行中字段的分隔符]
stored as orc[指定存儲文件類型（sequencefile 二進制序列文件、textfile 文本、rcfile 列式存儲格式文件，不指定就默認為文本文件]

（1）查詢建表法
通過AS語句，將查詢的子結果存在新表里

create table if not exists student1 as select;

like建表法

create table if not exists student2 like select;

（2）分區表的創建
一級分區

create table student_partition1(id int,name string,age int
)
partitioned by (dt string)
row format delimited fields terminated by '\t';

二級分區

create table student_partition1(id int,name string,age int
)
partitioned by (dt string，day string)
row format delimited fields terminated by '\t';

五、修改表結構

1.修改表名

alter tablestudent——partition1 rename to student—p1

2. 列修改或增加

增加列

alter table student add columns(字段名，字段類型)

修改列

alter table student change columns 字段名 更改的類型；

替換列

alter table student replace columns（deptno string，dname string，loc string）
替換表中所有有字段

3. 修改分區

# 添加單個分區
alter table student add partition（dt='20230402');
# 添加多個分區
alter table student add partition（dt='20230402',dt='20230402');
# 刪除分區
alter table student drop partition （dt= '20200401'）

五、常見函數

六、一對一關聯

left join左關聯

將左邊的表A作為主表，以A表為外循環對右表進行匹配，如果右表沒有匹配，就將右表項值為空

right join 右關聯

內連接

選擇兩個表同時出現的項

全連接

選擇所有出現的項

查詢只有A表的數據

select * from A 
left join B A.ID=B.ID
WHERE B.ID =NULL

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/news/213041.shtml
繁體地址，請注明出處：http://hk.pswp.cn/news/213041.shtml
英文地址，請注明出處：http://en.pswp.cn/news/213041.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！