本文參考鏈接置頂:?Presto使用Docker獨立運行Hive Standalone Metastore管理MinIO(S3)_hive minio_BigDataToAI的博客-CSDN博客
一. 背景
團隊要升級大數據架構,需要摒棄hadoop,底層使用Minio做存儲,應用層用trino火spark訪問minio。在使用trino訪問minio時,需要使用hive的metastore service,經過調查HMS(Hive Metastore Service)是可以獨立于hive組件的,即不需要整體安裝hive,只部署HMS就可以使用trino通過HMS來訪問minio。
二. 環境和步驟
1. 一臺centos7服務器,裝有docker, IP地址10.38.199.202
2. 使用mysql5.7.35作為HMS的元數據存儲,使用dockers部署mysql服務
3. 使用docker部署HMS,這里部署在另外一臺server上,IP地址10.38.199.201
4. 部署minio對象存儲服務(本篇略去,使用已提供的服務)
5. 部署trino,配置metasotre服務及訪問minio,trino部署在IP地址10.38.199.203
三. 部署mysql服務
1.拉取mysql鏡像,版本5.7.35
docker pull mysql:5.7.35
2.查看鏡像
docker images|grep mysql
3.啟動mysql服務
docker run -p 3306:3306 --name mysql --privileged=true \
-v /usr/local/mysql/log:/var/log/mysql \
-v /usr/local/mysql/data:/var/lib/mysql \
-v /usr/local/mysql/conf:/etc/mysql \
-v /etc/localtime:/etc/localtime:ro \
-e MYSQL_ROOT_PASSWORD=123456 -d mysql:5.7.35
4.查看mysql容器
docker ps|grep mysql
5.查看mysql日志
docker logs -f mysql
啟動成功
6.使用DBEVEA連接mysql服務,成功連接,并查看database
7.創建一個HMS的空數據庫metastore,后面HMS會用這個數據庫存儲元數據
四. 部署HMS,mestasotre service
1.部署前需要幾個安裝包
- jdk安裝包,自行下載,版本1.8以上
- HMS安裝包hive-standalone-metastore-3.1.2-bin.tar.gz,地址Central Repository: org/apache/hive/hive-standalone-metastore/3.1.2
- mysql connect jar包mysql-connector-java-5.1.49.jar,地址https://mvnrepository.com/artifact/mysql/mysql-connector-java/5.1.49
- hadoop環境包hadoop-3.2.2.tar.gz,HMS需要依賴于它,地址Index of /dist/hadoop/common/hadoop-3.2.2
2.編寫metastore.xml
編寫前注意幾個參數
minio的參數必須給出
- fs.s3a.endpoint
- fs.s3a.access.key
- fs.s3a.secret.key
mysql參數
- javax.jdo.option.ConnectionURL
- javax.jdo.option.ConnectionUserName
- javax.jdo.option.ConnectionPassword
metastore參數
- metastore.thrift.uris? 準備發布的metastore service URL
- metastore.warehouse.dir hive表數據存儲位置
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration><property><name>fs.s3a.access.key</name><value>bymOcmUZ6K8n5ApBu7Ee</value></property><property><name>fs.s3a.secret.key</name><value>lVtSARGXqypPpCRQ7LesGsfhRw3dE4imZoBs8ydS</value></property><property><name>fs.s3a.connection.ssl.enabled</name><value>false</value></property><property><name>fs.s3a.path.style.access</name><value>true</value></property><property><name>fs.s3a.endpoint</name><value>http://10.38.199.211:9000</value></property><property><name>javax.jdo.option.ConnectionURL</name><value>jdbc:mysql://10.38.199.202:3306/metastore?useSSL=false&serverTimezone=UTC</value></property><property><name>javax.jdo.option.ConnectionDriverName</name><value>com.mysql.jdbc.Driver</value></property><property><name>javax.jdo.option.ConnectionUserName</name><value>root</value></property><property><name>javax.jdo.option.ConnectionPassword</name><value>123456</value></property><property><name>hive.metastore.event.db.notification.api.auth</name><value>false</value></property><property><name>metastore.thrift.uris</name><value>thrift://10.38.199.201:9083</value><description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description></property><property><name>metastore.task.threads.always</name><value>org.apache.hadoop.hive.metastore.events.EventCleanerTask</value></property><property><name>metastore.expression.proxy</name><value>org.apache.hadoop.hive.metastore.DefaultPartitionExpressionProxy</value></property><property><name>metastore.warehouse.dir</name><value>/tmp</value></property>
</configuration>
將這些包和metastore.xml文件放在同一目錄
3.創建Dockerfile
編寫Dockerfile
FROM centos:centos7WORKDIR /installADD jdk.tar.gz /installRUN pwd
ADD hive-standalone-metastore-3.1.2-bin.tar.gz /install
RUN mv /install/apache-hive-metastore-3.1.2-bin metastoreADD hadoop-3.2.2.tar.gz /install
#RUN mv /install/hadoop-3.2.2 hadoopRUN ls
ADD mysql-connector-java-5.1.49.jar ./metastore/libENV JAVA_HOME=/install/jdk
ENV HADOOP_HOME=/install/hadoop-3.2.2RUN rm -f /install/metastore/lib/guava-19.0.jar \&& cp ${HADOOP_HOME}/share/hadoop/common/lib/guava-27.0-jre.jar /install/metastore/lib \&& cp ${HADOOP_HOME}/share/hadoop/tools/lib/hadoop-aws-3.2.2.jar /install/metastore/lib \&& cp ${HADOOP_HOME}/share/hadoop/tools/lib/aws-java-sdk-bundle-*.jar /install/metastore/lib# copy Hive metastore configuration file
ADD metastore-site.xml /install/metastore/conf/# Hive metastore data folder
VOLUME ["/tmp"]WORKDIR /install/metastoreRUN bin/schematool -initSchema -dbType mysqlCMD ["/install/metastore/bin/start-metastore"]
創建鏡像,創建的同時會在mysql的metastore數據庫中創建基表
查看mysql中的metastore數據庫中是否創建了基表,成功
4.啟動metastore service容器并查看狀態
--啟動容器
docker run -d -p 9083:9083/tcp --name minio-hive-metastore minio-hive-standalone-metastore:v1.0--查看容器
docker ps|grep minio-hive-metastore
五. 部署并配置trino
trino單機部署不再介紹,catalog中參數配置如下,最新配置了hiveminio.properties這個catalog
啟動trino
bin/launcher start
六.測試trino通過HMS訪問minio
1.進入trino控制臺
./trino --server http://10.38.199.203:8091 --catalog hiveminio --schema default
show schemas;show tables;
2.創建一個schemas “zytest”,指向mino的buket “zytest”,并創建一張表sample_table,插入一行數據,檢查mimio界面是否插入成功
3.文件掛載,在minio的zytest下面,創建一個新的path external_path,然后放入一個parquet文件
在trino中創建表掛載這個目錄,并查詢表數據
至此,整個測試完成!