hive索引

創建

hive (zmgdb)> create index index_t1 on table v_t1(name)
? ? ? ? ? ? > as
? ? ? ? ? ? > 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'
? ? ? ? ? ? > with
? ? ? ? ? ? > deferred rebuild in table save_index_t1_table;
OK
Time taken: 0.524 seconds

save_index_t1_table：保存索引的表。

即創建了的索引，需要一張表去保存，一個索引一張索引保存表，保存在hadoop里。
as 指定索引器，org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler 是固定值，常用的索引器。

重建索引，新增數據要重建索引，這樣在保存索引的 t1_index_table 就有索引信息了。

hive (zmgdb)> alter index index_t1 on v_t1 rebuild;

WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hadoop_20160923005139_9caf10f1-5481-4de8-b95a-889c19e45032
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
? set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
? set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
? set mapreduce.job.reduces=<number>
Starting Job = job_1474540738385_0003, Tracking URL = http://hello110:8088/proxy/application_1474540738385_0003/
Kill Command = /home/hadoop/app/hadoop-2.7.2/bin/hadoop job ?-kill job_1474540738385_0003
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2016-09-23 00:51:46,046 Stage-1 map = 0%, ?reduce = 0%
2016-09-23 00:51:54,485 Stage-1 map = 100%, ?reduce = 0%, Cumulative CPU 2.91 sec
2016-09-23 00:52:00,724 Stage-1 map = 100%, ?reduce = 100%, Cumulative CPU 4.76 sec
MapReduce Total cumulative CPU time: 4 seconds 760 msec
Ended Job = job_1474540738385_0003
Loading data to table zmgdb.save_index_t1_table
MapReduce Jobs Launched:?
Stage-Stage-1: Map: 1 ?Reduce: 1 ? Cumulative CPU: 4.76 sec ? HDFS Read: 9845 HDFS Write: 426 SUCCESS
Total MapReduce CPU Time Spent: 4 seconds 760 msec
OK
Time taken: 22.73 seconds

索引表分析

hive (zmgdb)> select * from save_index_t1_table;
OK
save_index_t1_table.name ? ? ? ?save_index_t1_table._bucketname save_index_t1_table._offsets
lisi ? ?hdfs://hello110:9000/user/hive/warehouse/zmgdb.db/v_t1/v_t1 ? ? [0]
xiaohua hdfs://hello110:9000/user/hive/warehouse/zmgdb.db/v_t1/v_t1 ? ? [49]
xiaoji ?hdfs://hello110:9000/user/hive/warehouse/zmgdb.db/v_t1/v_t1 ? ? [32]
ximing ?hdfs://hello110:9000/user/hive/warehouse/zmgdb.db/v_t1/v_t1 ? ? [15]
xx ? ? ?hdfs://hello110:9000/user/hive/warehouse/zmgdb.db/v_t1/v_t1 ? ? [67]
Time taken: 0.073 seconds, Fetched: 5 row(s)

索引里面保存了：索引鍵內容，內容所在文件位置，內容在文件里的偏移量。

hive select 會去找索引，例如name=lisi的值，找到該值所在的文件位置，和在文件里的偏移量，進入該文件到指定的偏移量里，找出來的就是了。

如果沒有索引，會開啟mr去目錄下全局查找，有了索引，就像書有了目錄，不用整本書找了，通過目錄找，肯定更快。簡單的select 查詢hive不啟用mapreduce，復雜的會啟動。

顯示表表的索引

show formatted index on t1;

刪除索引

drop index ?if exists t1_index on t1;

補充：

表的數據發生改變后，都要重建表的索引。

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/news/539164.shtml
繁體地址，請注明出處：http://hk.pswp.cn/news/539164.shtml
英文地址，請注明出處：http://en.pswp.cn/news/539164.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！