HiveMetaStore的指標采集告警
文章目錄
- HiveMetaStore的指標采集告警
- 背景
- 部署概要圖
- 開啟HiveMetaStore的JMX指標采集(Hadoop2指標系統)
- 指標監控
- 查詢指標
- 核心指標選擇
- 告警
- 遺留問題
背景
在遠程模式的Metastore下,對其開啟Hadoop2指標采集以及JMX的對外接口。通過單獨的程序請求JMX,獲取hive的基礎指標信息。對核心指標進行閾值告警。
部署概要圖
開啟HiveMetaStore的JMX指標采集(Hadoop2指標系統)
-
hive-site.xml
變更配置<!--開啟metastore的指標子服務--> <property><name>hive.metastore.metrics.enabled</name><value>true</value> </property><!--指標的輸出類型--> <property><name>hive.service.metrics.reporter</name><value>JMX,HADOOP2</value> </property><!--指標輸出Hadoop2指標系統的名稱--> <property><name>hive.service.metrics.hadoop2.component</name><value>hivemetastore</value> </property><!--指標輸出Hadoop2指標系統的周期--> <property><name>hive.service.metrics.hadoop2.frequency</name><value>30s</value> </property>
-
修改
~/hive/bin/hive
文件for j in $SERVICE_LIST ; doif [ "$j" = "$SERVICE" ] ; then## >>>>>> 增加的部分-開始 >>>>>> if [ "$SERVICE" = "hiveserver2" ] ; thenexport HADOOP_CLIENT_OPTS="$HADOOP_CLIENT_OPTS -XX:NewSize=1024m -XX:MaxNewSize=1024m -Xms5120m -Xmx5120m -XX:PermSize=100m"elif [ "$SERVICE" = "metastore" ] ; thenexport HADOOP_CLIENT_OPTS="$HADOOP_CLIENT_OPTS -XX:+PrintCommandLineFlags -XX:NewSize=2g -XX:MaxNewSize=2g -Xms4g -Xmx4g -XX:PermSize=128m -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -verbose:gc -XX:+PrintGCDetails -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -XX:+PrintPromotionFailure -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=2 -XX:GCLogFileSize=512M -Xloggc:/opt/hive/logs/gc-metastore.log -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.port=9082"fi ## <<<<<< 增加的部分-結束 <<<<<< TORUN=${j}$HELPfi done
jmxremote的端口不能被其他服務占用。
netstat -tuln | grep 9082
-
修改
/hadoop-2.9.1.1/etc/hadoop/hadoop-env.sh
export HADOOP_CLIENT_OPTS="-Djava.library.path=$HADOOP_COMMON_LIB_NATIVE_DIR $HADOOP_CLIENT_OPTS"
-
啟動
進入到hive目錄后
nohup hive --service metastore >>/opt/hive/logs/metastore.log 2>&1 &
指標監控
在華佗web中增加針對HiveMetaStore的監控任務,對每一臺HiveMetaStore的指標進行采集與告警。
查詢指標
import javax.management.*;
import javax.management.remote.JMXConnector;
import javax.management.remote.JMXConnectorFactory;
import javax.management.remote.JMXServiceURL;
import java.io.IOException;
import java.math.BigDecimal;
import java.math.RoundingMode;public class HiveMetaStoreMetric {public static void main(String[] args) throws IOException, MalformedObjectNameException, ReflectionException, InstanceNotFoundException, IntrospectionException, AttributeNotFoundException, MBeanException {// MetaStore的JMX連接地址JMXServiceURL url = new JMXServiceURL("service:jmx:rmi:///jndi/rmi://192.168.1.1:9082/jmxrmi");JMXConnector jmxc = JMXConnectorFactory.connect(url, null);MBeanServerConnection mbsc = jmxc.getMBeanServerConnection();// 查詢Hadoop2指標下的hivemetastoreObjectName query = new ObjectName("Hadoop:service=hivemetastore,name=hivemetastore");MBeanInfo minfo = mbsc.getMBeanInfo(query);MBeanAttributeInfo[] beanAttributeInfos = minfo.getAttributes();// 遍歷指標Object value;for (MBeanAttributeInfo attributeInfo : beanAttributeInfos) {value = mbsc.getAttribute(query, attributeInfo.getName());// 不同指標項的值屬性不同if (value instanceof Long) {Long l = (Long) value;System.out.println(attributeInfo.getName() + " " + value);} else if (value instanceof Double) {Double d = (Double) value;BigDecimal bg = new BigDecimal(d);double f1 = bg.setScale(2, RoundingMode.HALF_UP).doubleValue();System.out.println(attributeInfo.getName() + " " + f1);} else if (value instanceof Integer) {Integer i = (Integer) value;System.out.println(attributeInfo.getName() + " " + value);} else {System.out.println(attributeInfo.getName() + " " + value);}}// 關閉連接jmxc.close();}
}
結果示例
tag.rate_unit events/second
tag.duration_unit milliseconds
tag.Hostname bdsitapp255
buffers.direct.capacity 0
buffers.direct.count 0
buffers.direct.used 0
buffers.mapped.capacity 0
buffers.mapped.count 0
buffers.mapped.used 0
classLoading.loaded 7219
classLoading.unloaded 0
gc.ConcurrentMarkSweep.count 1
gc.ConcurrentMarkSweep.time 146
gc.ParNew.count 75
gc.ParNew.time 1065
init_total_count_dbs 489
init_total_count_partitions 51089
init_total_count_tables 13733
memory.heap.committed 4080271360
memory.heap.init 4294967296
memory.heap.max 4080271360
memory.heap.usage 0.06
memory.heap.used 236619048
memory.non-heap.committed 79024128
memory.non-heap.init 2555904
memory.non-heap.max -1
memory.non-heap.usage -7.7437264E7
memory.non-heap.used 77437264
memory.pools.CMS-Old-Gen.usage 0.01
memory.pools.Code-Cache.usage 0.09
memory.pools.Compressed-Class-Space.usage 0.0
memory.pools.Metaspace.usage 0.98
memory.pools.Par-Eden-Space.usage 0.13
memory.pools.Par-Survivor-Space.usage 0.01
memory.total.committed 4159295488
memory.total.init 4297523200
memory.total.max 4080271359
memory.total.used 314056312
threads.blocked.count 0
threads.count 223
threads.daemon.count 22
threads.deadlock.count 0
threads.new.count 0
threads.runnable.count 8
threads.terminated.count 0
threads.timed_waiting.count 9
threads.waiting.count 206
active_calls_api_get_database 0
active_calls_api_get_tables 0
active_calls_api_init 0
active_calls_api_set_ugi 0
jvm.pause.extraSleepTime 240
open_connections 1
api_get_database_count 10446
api_get_database_mean_rate 0.02
api_get_database_1min_rate 0.01
api_get_database_5min_rate 0.02
api_get_database_15min_rate 0.02
api_get_database_mean 9.24
api_get_database_min 8.45
api_get_database_max 13.3
api_get_database_median 9.23
api_get_database_stddev 0.63
api_get_database_75thpercentile 9.95
api_get_database_95thpercentile 10.1
api_get_database_98thpercentile 10.1
api_get_database_99thpercentile 10.1
api_get_database_999thpercentile 10.1
api_get_tables_count 3482
api_get_tables_mean_rate 0.01
api_get_tables_1min_rate 0.0
api_get_tables_5min_rate 0.01
api_get_tables_15min_rate 0.01
api_get_tables_mean 7.76
api_get_tables_min 7.31
api_get_tables_max 9.18
api_get_tables_median 7.79
api_get_tables_stddev 0.04
api_get_tables_75thpercentile 7.79
api_get_tables_95thpercentile 7.79
api_get_tables_98thpercentile 7.79
api_get_tables_99thpercentile 7.89
api_get_tables_999thpercentile 8.0
api_init_count 1
api_init_mean_rate 0.0
api_init_1min_rate 0.0
api_init_5min_rate 0.0
api_init_15min_rate 0.0
api_init_mean 3519.03
api_init_min 3519.03
api_init_max 3519.03
api_init_median 3519.03
api_init_stddev 0.0
api_init_75thpercentile 3519.03
api_init_95thpercentile 3519.03
api_init_98thpercentile 3519.03
api_init_99thpercentile 3519.03
api_init_999thpercentile 3519.03
api_set_ugi_count 1
api_set_ugi_mean_rate 0.0
api_set_ugi_1min_rate 0.0
api_set_ugi_5min_rate 0.0
api_set_ugi_15min_rate 0.0
api_set_ugi_mean 0.26
api_set_ugi_min 0.26
api_set_ugi_max 0.26
api_set_ugi_median 0.26
api_set_ugi_stddev 0.0
api_set_ugi_75thpercentile 0.26
api_set_ugi_95thpercentile 0.26
api_set_ugi_98thpercentile 0.26
api_set_ugi_99thpercentile 0.26
api_set_ugi_999thpercentile 0.26
核心指標選擇
gc.ParNew.count
:新生代發生GC的次數,算平均GC耗時gc.ParNew.time
:新生代發生GC的總耗時,單位ms,算平均GC耗時memory.heap.usage
:堆內存使用占比open_connections
:當前打開的連接數active_calls_api_create_table
:當前創建表的請求數active_calls_api_drop_table
:當前刪除表的請求數active_calls_api_alter_table
:當前變更表的請求數api_get_tables_mean
:get_tables的平均請求時間,msapi_get_database_mean
:get_database的平均請求時間,msapi_get_table_mean
:get_table的平均請求時間,msapi_get_databases_mean
:get_databases的平均請求時間,msapi_get_multi_table_mean
:get multi table的平均請求時間,ms
告警
可以針對以上指標在SCM上配置告警閾值,控制指標異常告警。
告警時間:2024-04-20 20:40:00
級別:嚴重
環境:PRD
事件標識:HiveMetaStore-Metris-open_connections-192.168.1.1
告警內容:[2024-04-20 20:49:10] HiveMetaStore節點(192.168.1.1)指標open_connections(1000)異常。[2024-04-20 20:39:10] HiveMetaStore節點(192.168.1.1)指標open_connections(1002)異常。
事件數量:2
遺留問題
-
實際PRD的指標項會比示例中的指標項多,全部的指標數據是否需要存儲?如果要存儲,則存儲介質選擇什么?
先臨時計算指標報文的大小,直接輸出在日志文件中。