HIVE攻略 JFK_Hive安裝及使用攻略

目錄Hive的安裝

Hive的基本使用:CRUD

Hive交互式模式

數據導入

數據導出

Hive查詢HiveQL

Hive視圖

Hive分區表

1. Hive的安裝

系統環境裝好hadoop的環境后，我們可以把Hive裝在namenode機器上(c1)。hadoop的環境，請參考：讓Hadoop跑在云端系列文章，RHadoop實踐系列之一:Hadoop環境搭建

下載: hive-0.9.0.tar.gz解壓到： /home/cos/toolkit/hive-0.9.0

hive配置~ cd /home/cos/toolkit/hive-0.9.0

~ cp hive-default.xml.template hive-site.xml

~ cp hive-log4j.properties.template hive-log4j.properties

修改hive-site.xml配置文件把Hive的元數據存儲到MySQL中~ vi conf/hive-site.xml

javax.jdo.option.ConnectionURL

jdbc:mysql://c1:3306/hive_metadata?createDatabaseIfNotExist=true

JDBC connect string for a JDBC metastore

javax.jdo.option.ConnectionDriverName

com.mysql.jdbc.Driver

Driver class name for a JDBC metastore

javax.jdo.option.ConnectionUserName

hive

username to use against metastore database

javax.jdo.option.ConnectionPassword

hive

password to use against metastore database

hive.metastore.warehouse.dir

/user/hive/warehouse

location of default database for the warehouse

修改hive-log4j.properties#log4j.appender.EventCounter=org.apache.hadoop.metrics.jvm.EventCounter

log4j.appender.EventCounter=org.apache.hadoop.log.metrics.EventCounter

設置環境變量~ sudo vi /etc/environment

PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/home/cos/toolkit/ant184/bin:/home/cos/toolkit/jdk16/bin:/home/cos/toolkit/maven3/bin:/home/cos/toolkit/hadoop-1.0.3/bin:/home/cos/toolkit/hive-0.9.0/bin"

JAVA_HOME=/home/cos/toolkit/jdk16

ANT_HOME=/home/cos/toolkit/ant184

MAVEN_HOME=/home/cos/toolkit/maven3

HADOOP_HOME=/home/cos/toolkit/hadoop-1.0.3

HIVE_HOME=/home/cos/toolkit/hive-0.9.0

CLASSPATH=/home/cos/toolkit/jdk16/lib/dt.jar:/home/cos/toolkit/jdk16/lib/tools.jar

在hdfs上面，創建目錄$HADOOP_HOME/bin/hadoop fs -mkidr /tmp

$HADOOP_HOME/bin/hadoop fs -mkidr /user/hive/warehouse

$HADOOP_HOME/bin/hadoop fs -chmod g+w /tmp

$HADOOP_HOME/bin/hadoop fs -chmod g+w /user/hive/warehouse

在MySQL中創建數據庫create database hive_metadata;

grant all on hive_metadata.* to hive@'%' identified by 'hive';

grant all on hive_metadata.* to hive@localhost identified by 'hive';

ALTER DATABASE hive_metadata CHARACTER SET latin1;

手動上傳mysql的jdbc庫到hive/lib~ ls /home/cos/toolkit/hive-0.9.0/lib

mysql-connector-java-5.1.22-bin.jar

啟動hive#啟動metastore服務

~ bin/hive --service metastore &

Starting Hive Metastore Server

#啟動hiveserver服務

~ bin/hive --service hiveserver &

Starting Hive Thrift Server

#啟動hive客戶端

~ bin/hive shell

Logging initialized using configuration in file:/root/hive-0.9.0/conf/hive-log4j.properties

Hive history file=/tmp/root/hive_job_log_root_201211141845_1864939641.txt

hive> show tables

查詢MySQL數據庫中的元數據~ mysql -uroot -p

mysql> use hive_metadata;

Database changed

mysql> show tables;

+-------------------------+

| Tables_in_hive_metadata |

+-------------------------+

| BUCKETING_COLS |

| CDS |

| COLUMNS_V2 |

| DATABASE_PARAMS |

| DBS |

| IDXS |

| INDEX_PARAMS |

| PARTITIONS |

| PARTITION_KEYS |

| PARTITION_KEY_VALS |

| PARTITION_PARAMS |

| PART_COL_PRIVS |

| PART_PRIVS |

| SDS |

| SD_PARAMS |

| SEQUENCE_TABLE |

| SERDES |

| SERDE_PARAMS |

| SORT_COLS |

| TABLE_PARAMS |

| TBLS |

| TBL_COL_PRIVS |

| TBL_PRIVS |

+-------------------------+

23 rows in set (0.00 sec)

Hive已經成功安裝，下面是hive的使用攻略。

2. Hive的基本使用

1. 進入hive控制臺~ cd /home/cos/toolkit/hive-0.9.0

~ bin/hive shell

Logging initialized using configuration in file:/home/cos/toolkit/hive-0.9.0/conf/hive-log4j.properties

Hive history file=/tmp/cos/hive_job_log_cos_201307160003_95040367.txt

hive>

新建表#創建數據(文本以tab分隔)

~ vi /home/cos/demo/t_hive.txt

16 2 3

61 12 13

41 2 31

17 21 3

71 2 31

1 12 34

11 2 34

#創建新表

hive> CREATE TABLE t_hive (a int, b int, c int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';

Time taken: 0.489 seconds

#導入數據t_hive.txt到t_hive表

hive> LOAD DATA LOCAL INPATH '/home/cos/demo/t_hive.txt' OVERWRITE INTO TABLE t_hive ;

Copying data from file:/home/cos/demo/t_hive.txt

Copying file: file:/home/cos/demo/t_hive.txt

Loading data to table default.t_hive

Deleted hdfs://c1.wtmart.com:9000/user/hive/warehouse/t_hive

Time taken: 0.397 seconds

查看表和數據#查看表

hive> show tables;

t_hive

Time taken: 0.099 seconds

#正則匹配表名

hive>show tables '*t*';

t_hive

Time taken: 0.065 seconds

#查看表數據

hive> select * from t_hive;

16 2 3

61 12 13

41 2 31

17 21 3

71 2 31

1 12 34

11 2 34

Time taken: 0.264 seconds

#查看表結構

hive> desc t_hive;

a int

b int

c int

Time taken: 0.1 seconds

修改表#增加一個字段

hive> ALTER TABLE t_hive ADD COLUMNS (new_col String);

Time taken: 0.186 seconds

hive> desc t_hive;

a int

b int

c int

new_col string

Time taken: 0.086 seconds

#重命令表名

~ ALTER TABLE t_hive RENAME TO t_hadoop;

Time taken: 0.45 seconds

hive> show tables;

t_hadoop

Time taken: 0.07 seconds

刪除表hive> DROP TABLE t_hadoop;

Time taken: 0.767 seconds

hive> show tables;

Time taken: 0.064 seconds

3. Hive交互式模式quit,exit: ?退出交互式shell

reset: 重置配置為默認值

set =: 修改特定變量的值(如果變量名拼寫錯誤，不會報錯)

set?:? 輸出用戶覆蓋的hive配置變量

set -v : 輸出所有Hadoop和Hive的配置變量

add FILE[S] *,?add JAR[S] *,?add ARCHIVE[S] * : 添加一個或多個 file, jar, archives到分布式緩存

list FILE[S],?list JAR[S],?list ARCHIVE[S] : 輸出已經添加到分布式緩存的資源。

list FILE[S] *,?list JAR[S] *,list ARCHIVE[S] * : 檢查給定的資源是否添加到分布式緩存

delete FILE[S] *,delete JAR[S] *,delete ARCHIVE[S] * : 從分布式緩存刪除指定的資源

! :??從Hive shell執行一個shell命令

dfs : ?從Hive shell執行一個dfs命令

: 執行一個Hive 查詢，然后輸出結果到標準輸出

source FILE : ?在CLI里執行一個hive腳本文件

4. 數據導入

還以剛才的t_hive為例。#創建表結構

hive> CREATE TABLE t_hive (a int, b int, c int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';

從操作本地文件系統加載數據(LOCAL)hive> LOAD DATA LOCAL INPATH '/home/cos/demo/t_hive.txt' OVERWRITE INTO TABLE t_hive ;

Copying data from file:/home/cos/demo/t_hive.txt

Copying file: file:/home/cos/demo/t_hive.txt

Loading data to table default.t_hive

Deleted hdfs://c1.wtmart.com:9000/user/hive/warehouse/t_hive

Time taken: 0.612 seconds

#在HDFS中查找剛剛導入的數據

~ hadoop fs -cat /user/hive/warehouse/t_hive/t_hive.txt

16 2 3

61 12 13

41 2 31

17 21 3

71 2 31

1 12 34

11 2 34

從HDFS加載數據創建表t_hive2

hive> CREATE TABLE t_hive2 (a int, b int, c int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';

#從HDFS加載數據

hive> LOAD DATA INPATH '/user/hive/warehouse/t_hive/t_hive.txt' OVERWRITE INTO TABLE t_hive2;

Loading data to table default.t_hive2

Deleted hdfs://c1.wtmart.com:9000/user/hive/warehouse/t_hive2

Time taken: 0.325 seconds

#查看數據

hive> select * from t_hive2;

16 2 3

61 12 13

41 2 31

17 21 3

71 2 31

1 12 34

11 2 34

Time taken: 0.287 seconds

從其他表導入數據hive> INSERT OVERWRITE TABLE t_hive2 SELECT * FROM t_hive ;

Total MapReduce jobs = 2

Launching Job 1 out of 2

Number of reduce tasks is set to 0 since there's no reduce operator

Starting Job = job_201307131407_0002, Tracking URL = http://c1.wtmart.com:50030/jobdetails.jsp?jobid=job_201307131407_0002

Kill Command = /home/cos/toolkit/hadoop-1.0.3/libexec/../bin/hadoop job -Dmapred.job.tracker=hdfs://c1.wtmart.com:9001 -kill job_201307131407_0002

Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0

2013-07-16 10:32:41,979 Stage-1 map = 0%, reduce = 0%

2013-07-16 10:32:48,034 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.03 sec

2013-07-16 10:32:49,050 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.03 sec

2013-07-16 10:32:50,068 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.03 sec

2013-07-16 10:32:51,082 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.03 sec

2013-07-16 10:32:52,093 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.03 sec

2013-07-16 10:32:53,102 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.03 sec

2013-07-16 10:32:54,112 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 1.03 sec

MapReduce Total cumulative CPU time: 1 seconds 30 msec

Ended Job = job_201307131407_0002

Ended Job = -314818888, job is filtered out (removed at runtime).

Moving data to: hdfs://c1.wtmart.com:9000/tmp/hive-cos/hive_2013-07-16_10-32-31_323_5732404975764014154/-ext-10000

Loading data to table default.t_hive2

Deleted hdfs://c1.wtmart.com:9000/user/hive/warehouse/t_hive2

Table default.t_hive2 stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 56, raw_data_size: 0]

7 Rows loaded to t_hive2

MapReduce Jobs Launched:

Job 0: Map: 1 Cumulative CPU: 1.03 sec HDFS Read: 273 HDFS Write: 56 SUCCESS

Total MapReduce CPU Time Spent: 1 seconds 30 msec

Time taken: 23.227 seconds

hive> select * from t_hive2;

16 2 3

61 12 13

41 2 31

17 21 3

71 2 31

1 12 34

11 2 34

Time taken: 0.134 seconds

創建表并從其他表導入數據#刪除表

hive> DROP TABLE t_hive;

#創建表并從其他表導入數據

hive> CREATE TABLE t_hive AS SELECT * FROM t_hive2 ;

Total MapReduce jobs = 2

Launching Job 1 out of 2

Number of reduce tasks is set to 0 since there's no reduce operator

Starting Job = job_201307131407_0003, Tracking URL = http://c1.wtmart.com:50030/jobdetails.jsp?jobid=job_201307131407_0003

Kill Command = /home/cos/toolkit/hadoop-1.0.3/libexec/../bin/hadoop job -Dmapred.job.tracker=hdfs://c1.wtmart.com:9001 -kill job_201307131407_0003

Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0

2013-07-16 10:36:48,612 Stage-1 map = 0%, reduce = 0%

2013-07-16 10:36:54,648 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.13 sec

2013-07-16 10:36:55,657 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.13 sec

2013-07-16 10:36:56,666 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.13 sec

2013-07-16 10:36:57,673 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.13 sec

2013-07-16 10:36:58,683 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.13 sec

2013-07-16 10:36:59,691 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 1.13 sec

MapReduce Total cumulative CPU time: 1 seconds 130 msec

Ended Job = job_201307131407_0003

Ended Job = -670956236, job is filtered out (removed at runtime).

Moving data to: hdfs://c1.wtmart.com:9000/tmp/hive-cos/hive_2013-07-16_10-36-39_986_1343249562812540343/-ext-10001

Moving data to: hdfs://c1.wtmart.com:9000/user/hive/warehouse/t_hive

Table default.t_hive stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 56, raw_data_size: 0]

7 Rows loaded to hdfs://c1.wtmart.com:9000/tmp/hive-cos/hive_2013-07-16_10-36-39_986_1343249562812540343/-ext-10000

MapReduce Jobs Launched:

Job 0: Map: 1 Cumulative CPU: 1.13 sec HDFS Read: 272 HDFS Write: 56 SUCCESS

Total MapReduce CPU Time Spent: 1 seconds 130 msec

Time taken: 20.13 seconds

hive> select * from t_hive;

16 2 3

61 12 13

41 2 31

17 21 3

71 2 31

1 12 34

11 2 34

Time taken: 0.109 seconds

僅復制表結構不導數據hive> CREATE TABLE t_hive3 LIKE t_hive;

hive> select * from t_hive3;

Time taken: 0.077 seconds

從MySQL數據庫導入數據我們將在介紹Sqoop時講。

5. 數據導出

從HDFS復制到HDFS其他位置~ hadoop fs -cp /user/hive/warehouse/t_hive /

~ hadoop fs -ls /t_hive

Found 1 items

-rw-r--r-- 1 cos supergroup 56 2013-07-16 10:41 /t_hive/000000_0

~ hadoop fs -cat /t_hive/000000_0

1623

611213

41231

17213

71231

11234

通過Hive導出到本地文件系統hive> INSERT OVERWRITE LOCAL DIRECTORY '/tmp/t_hive' SELECT * FROM t_hive;

Total MapReduce jobs = 1

Launching Job 1 out of 1

Number of reduce tasks is set to 0 since there's no reduce operator

Starting Job = job_201307131407_0005, Tracking URL = http://c1.wtmart.com:50030/jobdetails.jsp?jobid=job_201307131407_0005

Kill Command = /home/cos/toolkit/hadoop-1.0.3/libexec/../bin/hadoop job -Dmapred.job.tracker=hdfs://c1.wtmart.com:9001 -kill job_201307131407_0005

Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0

2013-07-16 10:46:24,774 Stage-1 map = 0%, reduce = 0%

2013-07-16 10:46:30,823 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.87 sec

2013-07-16 10:46:31,833 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.87 sec

2013-07-16 10:46:32,844 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.87 sec

2013-07-16 10:46:33,856 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.87 sec

2013-07-16 10:46:34,865 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.87 sec

2013-07-16 10:46:35,873 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.87 sec

2013-07-16 10:46:36,884 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 0.87 sec

MapReduce Total cumulative CPU time: 870 msec

Ended Job = job_201307131407_0005

Copying data to local directory /tmp/t_hive

7 Rows loaded to /tmp/t_hive

MapReduce Jobs Launched:

Job 0: Map: 1 Cumulative CPU: 0.87 sec HDFS Read: 271 HDFS Write: 56 SUCCESS

Total MapReduce CPU Time Spent: 870 msec

Time taken: 23.369 seconds

#查看本地操作系統

hive> ! cat /tmp/t_hive/000000_0;

hive> 1623

611213

41231

17213

71231

11234

6. Hive查詢HiveQL

注：以下代碼將去掉map,reduce的日志輸出部分。

普通查詢：排序，列別名，嵌套子查詢hive> FROM (

> SELECT b,c as c2 FROM t_hive

> ) t

> SELECT t.b, t.c2

> WHERE b>2

> LIMIT 2;

12 13

21 3

連接查詢：JOINhive> SELECT t1.a,t1.b,t2.a,t2.b

> FROM t_hive t1 JOIN t_hive2 t2 on t1.a=t2.a

> WHERE t1.c>10;

1 12 1 12

11 2 11 2

41 2 41 2

61 12 61 12

71 2 71 2

聚合查詢1：count, avghive> SELECT count(*), avg(a) FROM t_hive;

7 31.142857142857142

聚合查詢2：count, distincthive> SELECT count(DISTINCT b) FROM t_hive;

聚合查詢3：GROUP BY, HAVING#GROUP BY

hive> SELECT avg(a),b,sum(c) FROM t_hive GROUP BY b,c

16.0 2 3

56.0 2 62

11.0 2 34

61.0 12 13

1.0 12 34

17.0 21 3

#HAVING

hive> SELECT avg(a),b,sum(c) FROM t_hive GROUP BY b,c HAVING sum(c)>30

56.0 2 62

11.0 2 34

1.0 12 34

7. Hive視圖

Hive視圖和數據庫視圖的概念是一樣的，我們還以t_hive為例。hive> CREATE VIEW v_hive AS SELECT a,b FROM t_hive where c>30;

hive> select * from v_hive;

41 2

71 2

1 12

11 2

刪除視圖hive> DROP VIEW IF EXISTS v_hive;

Time taken: 0.495 seconds

8. Hive分區表

分區表是數據庫的基本概念，但很多時候數據量不大，我們完全用不到分區表。Hive是一種OLAP數據倉庫軟件，涉及的數據量是非常大的，所以分區表在這個場景就顯得非常重要！！

下面我們重新定義一個數據表結構：t_hft

創建數據~ vi /home/cos/demo/t_hft_20130627.csv

000001,092023,9.76

000002,091947,8.99

000004,092002,9.79

000005,091514,2.2

000001,092008,9.70

000001,092059,9.45

~ vi /home/cos/demo/t_hft_20130628.csv

000001,092023,9.76

000002,091947,8.99

000004,092002,9.79

000005,091514,2.2

000001,092008,9.70

000001,092059,9.45

創建數據表DROP TABLE IF EXISTS t_hft;

CREATE TABLE t_hft(

SecurityID STRING,

tradeTime STRING,

PreClosePx DOUBLE

) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';

創建分區數據表根據業務：按天和股票ID進行分區設計DROP TABLE IF EXISTS t_hft;

CREATE TABLE t_hft(

SecurityID STRING,

tradeTime STRING,

PreClosePx DOUBLE

) PARTITIONED BY (tradeDate INT)

ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';

導入數據#20130627

hive> LOAD DATA LOCAL INPATH '/home/cos/demo/t_hft_20130627.csv' OVERWRITE INTO TABLE t_hft PARTITION (tradeDate=20130627);

Copying data from file:/home/cos/demo/t_hft_20130627.csv

Copying file: file:/home/cos/demo/t_hft_20130627.csv

Loading data to table default.t_hft partition (tradedate=20130627)

#20130628

hive> LOAD DATA LOCAL INPATH '/home/cos/demo/t_hft_20130628.csv' OVERWRITE INTO TABLE t_hft PARTITION (tradeDate=20130628);

Copying data from file:/home/cos/demo/t_hft_20130628.csv

Copying file: file:/home/cos/demo/t_hft_20130628.csv

Loading data to table default.t_hft partition (tradedate=20130628)

查看分區表hive> SHOW PARTITIONS t_hft;

tradedate=20130627

tradedate=20130628

Time taken: 0.082 seconds

查詢數據hive> select * from t_hft where securityid='000001';

000001 092023 9.76 20130627

000001 092008 9.7 20130627

000001 092059 9.45 20130627

000001 092023 9.76 20130628

000001 092008 9.7 20130628

000001 092059 9.45 20130628

hive> select * from t_hft where tradedate=20130627 and PreClosePx<9;

000002 091947 8.99 20130627

000005 091514 2.2 20130627

Hive基于使用完成，這些都是日常的操作。后面我會繼續講一下，HiveQL優化及Hive的運維。

參照：http://blog.fens.me/hadoop-hive-intro/

HIVE攻略 JFK_Hive安裝及使用攻略

相關文章

MySQL 為什么用索引，為什么是 B+樹，怎么用索引

頁面加載完畢執行多個JS函數

Servlet 生命周期、工作原理

【Go 并發控制】上下文 context 源碼

js設置全局變量ajax中賦值

iOS開發UI篇—模仿ipad版QQ空間登錄界面

click傳值vue_對vue下點擊事件傳參和不傳參的區別詳解

【Golang 源碼】sync.Map 源碼詳解

oracle中scn（系統改變號）

sicktim571操作手冊_SICK激光傳感器TIM310操作說明書

Tengine 安裝配置全過程

【Go】sync.WaitGroup 源碼分析

什么是響應式設計？為什么要做響應式設計？響應式設計的基本原理是什么？...

三個數相減的平方公式_快收好這份小學數學公式大全！孩子遇到數學難題時肯定用得上...

Eclipse 控制console

【Go】sync.RWMutex源碼分析

add.attribute向前端傳_前端知識-概念篇

【數據庫】一篇文章搞懂數據庫隔離級別那些事（LBCC,MVCC）

AFNetworking網絡請求與圖片上傳工具（POST）

api商品分享源碼_SSM框架高并發和商品秒殺項目高并發秒殺API源碼免費分享