Spring boot應用監控集成

Spring Boot應用監控集成記錄

背景

XScholar文獻下載應用基于Spring Boot構建，需要接入Prometheus監控系統。應用已部署并運行在服務器上，需要暴露metrics端點供Prometheus采集。

初始狀態

應用信息

框架: Spring Boot 2.x
部署端口: 10089
服務器: Linux服務器 (IPv4/IPv6雙棧網絡)
Prometheus: Docker容器部署

已有依賴

項目中已包含監控相關依賴：

<dependency><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<dependency><groupId>io.micrometer</groupId><artifactId>micrometer-registry-prometheus</artifactId>
</dependency>

集成過程記錄

第一步：配置Spring Boot應用

基礎配置

# application-prod.yml
management:endpoints:web:exposure:include: "health,info,prometheus"endpoint:prometheus:enabled: truehealth:show-details: alwaysmetrics:export:prometheus:enabled: true

關鍵配置說明

endpoints.web.exposure.include: 暴露prometheus端點
endpoint.prometheus.enabled: 啟用Prometheus指標導出
metrics.export.prometheus.enabled: 啟用Prometheus格式指標

第二步：網絡綁定配置問題

遇到的嚴重問題

應用啟動后，Prometheus無法采集到數據，targets顯示為DOWN狀態。

初始錯誤配置

# 錯誤的配置 - 只綁定localhost
server:port: 10089# 默認只綁定127.0.0.1，外部無法訪問

問題分析過程

本地測試正常：在應用服務器上curl localhost:10089/actuator/prometheus能正常返回數據
遠程訪問失敗：從Prometheus容器或其他服務器無法訪問
網絡診斷：使用netstat -tlnp | grep 10089發現應用只綁定了127.0.0.1

解決方案

# 正確的配置 - 綁定所有網絡接口
server:port: 10089address: 0.0.0.0  # 關鍵配置：綁定所有網絡接口

驗證方法

# 檢查端口綁定情況
netstat -tlnp | grep 10089
# 應該看到: 0.0.0.0:10089 而不是 127.0.0.1:10089# 測試外部訪問
curl http://SERVER_IP:10089/actuator/health
curl http://SERVER_IP:10089/actuator/prometheus

第三步：Prometheus配置中的IP地址問題

遇到的核心問題

即使應用綁定了0.0.0.0，Prometheus仍然無法采集數據。

錯誤的Prometheus配置

# prometheus.yml - 錯誤配置
scrape_configs:- job_name: 'xscholar-scheduler'static_configs:- targets: ['localhost:10089']        # 錯誤：容器內的localhost# 或- targets: ['10.10.132.55:10089']   # 錯誤：內網IP在容器中不可達

問題根因分析

容器網絡隔離: Prometheus運行在Docker容器中，有獨立的網絡命名空間
localhost解析: 容器內的localhost指向容器本身，而非宿主機
內網IP限制: 容器可能無法直接訪問宿主機的內網IP

解決方案：使用公網IP

# prometheus.yml - 正確配置
scrape_configs:- job_name: 'xscholar-scheduler'static_configs:- targets: ['PUBLIC_IP:10089']  # 使用服務器的公網IPmetrics_path: '/actuator/prometheus'scrape_interval: 30sscrape_timeout: 10s

網絡架構說明

Internet↓
Public IP (服務器公網地址)↓
Server (運行Spring Boot應用)↓ Docker網絡
Docker容器 (Prometheus)

第四步：IPv4/IPv6網絡棧問題

遇到的復雜問題

配置公網IP后，仍然出現間歇性連接問題，日志顯示網絡超時。

問題現象

# Prometheus日志中的錯誤
level=warn msg="Error on ingesting samples" err="connection refused"
level=warn msg="Scrape failed" target="PUBLIC_IP:10089" err="context deadline exceeded"

根因分析

現代Linux服務器通常同時支持IPv4和IPv6，JVM默認可能優先使用IPv6，導致網絡連接問題。

JVM網絡棧配置問題

# 問題：JVM啟動參數順序和IPv6優先級
java -jar app.jar -Djava.net.preferIPv4Stack=true

解決方案

# 正確的JVM啟動參數配置
java -Djava.net.preferIPv4Stack=true \-Djava.net.preferIPv6Addresses=false \-jar xscholar-scheduler.jar

參數說明

preferIPv4Stack=true: 強制JVM使用IPv4網絡棧
preferIPv6Addresses=false: 禁用IPv6地址優先級
參數位置: 必須在-jar之前，否則不會生效

第五步：監控指標驗證

驗證metrics端點

# 檢查基礎指標
curl http://PUBLIC_IP:10089/actuator/prometheus | grep jvm_memory# 檢查自定義業務指標
curl http://PUBLIC_IP:10089/actuator/prometheus | grep daily_task# 檢查指標數量
curl http://PUBLIC_IP:10089/actuator/prometheus | wc -l

Prometheus驗證

# 檢查target狀態
curl http://localhost:9090/api/v1/targets | jq '.data.activeTargets[] | select(.job=="xscholar-scheduler")'# 查詢特定指標
curl 'http://localhost:9090/api/v1/query?query=up{job="xscholar-scheduler"}'

完整配置示例

Spring Boot配置

# application-prod.yml
server:port: 10089address: 0.0.0.0  # 關鍵：綁定所有網絡接口management:endpoints:web:exposure:include: "health,info,prometheus"endpoint:prometheus:enabled: truehealth:show-details: alwaysmetrics:export:prometheus:enabled: truetags:application: xscholar-schedulerenvironment: production

Prometheus配置

# prometheus.yml
scrape_configs:- job_name: 'xscholar-scheduler'static_configs:- targets: ['PUBLIC_IP:10089']  # 使用公網IPmetrics_path: '/actuator/prometheus'scrape_interval: 30sscrape_timeout: 10shonor_labels: truescheme: http

JVM啟動配置

#!/bin/bash
# start-app.sh
java -Djava.net.preferIPv4Stack=true \-Djava.net.preferIPv6Addresses=false \-Duser.timezone=Asia/Shanghai \-Xms1g -Xmx2g \-jar xscholar-scheduler.jar \--spring.profiles.active=prod

網絡問題排查流程

第一層：應用層檢查

# 1. 檢查應用端口綁定
netstat -tlnp | grep 10089# 2. 本地訪問測試
curl http://localhost:10089/actuator/health# 3. 內網訪問測試
curl http://INTERNAL_IP:10089/actuator/health# 4. 公網訪問測試
curl http://PUBLIC_IP:10089/actuator/health

第二層：網絡連通性檢查

# 1. 防火墻檢查
sudo ufw status
sudo iptables -L | grep 10089# 2. 端口可達性測試
telnet PUBLIC_IP 10089# 3. 從Prometheus容器測試
docker exec prometheus wget -O- http://PUBLIC_IP:10089/actuator/prometheus

第三層：容器網絡檢查

# 1. 檢查容器網絡配置
docker network ls
docker network inspect prometheus_monitoring# 2. 容器間通信測試
docker exec prometheus ping PUBLIC_IP# 3. DNS解析測試
docker exec prometheus nslookup PUBLIC_IP

踩坑總結

主要難點

網絡綁定理解不足: localhost vs 0.0.0.0的區別
容器網絡隔離: Docker容器網絡與宿主機網絡的關系
IP地址選擇: 內網IP vs 公網IP的可達性問題
IPv4/IPv6棧: JVM網絡棧優先級問題

關鍵經驗教訓

逐層排查: 從應用→網絡→容器，分層次排查問題
網絡理解: 深入理解容器網絡和宿主機網絡的關系
參數順序: JVM參數位置影響是否生效
配置驗證: 每層配置都要獨立驗證

最佳實踐

網絡配置規范

應用綁定: 生產環境統一使用0.0.0.0綁定
IP地址選擇: 優先使用公網IP，確保各組件可達
IPv4優先: 生產環境強制使用IPv4避免兼容性問題

排查工具集合

# 網絡診斷工具包
netstat -tlnp | grep PORT        # 檢查端口綁定
ss -tlnp | grep PORT            # 現代版netstat
curl -I http://IP:PORT          # HTTP連通性測試
telnet IP PORT                  # TCP連通性測試
nmap -p PORT IP                 # 端口掃描

監控驗證清單

應用端口正確綁定到0.0.0.0
防火墻規則允許對應端口
metrics端點返回有效數據
Prometheus能成功scrape目標
target狀態顯示為UP
指標數據在Prometheus中可查詢

常見錯誤案例

錯誤1：只綁定localhost

# 錯誤配置
server:port: 10089# 缺少address配置，默認只綁定127.0.0.1

現象: 本地curl正常，遠程訪問失敗
解決: 添加address: 0.0.0.0

錯誤2：使用容器內localhost

# 錯誤配置
- targets: ['localhost:10089']

現象: Prometheus無法連接目標
解決: 使用宿主機的公網IP

錯誤3：JVM參數位置錯誤

# 錯誤啟動方式
java -jar app.jar -Djava.net.preferIPv4Stack=true

現象: IPv6優先導致連接問題
解決: 參數必須在-jar之前

性能考慮

指標收集頻率

# 根據業務需求調整采集頻率
scrape_configs:- job_name: 'xscholar-scheduler'scrape_interval: 30s    # 業務應用30秒采集一次scrape_timeout: 10s     # 10秒超時

指標過濾優化

# 只采集需要的指標，減少存儲壓力
metric_relabel_configs:- source_labels: [__name__]regex: '(daily_task_.*|token_.*|last_task_.*|jvm_memory_.*)'action: keep

下一步

Spring Boot應用成功接入Prometheus后，下一階段將重點關注：

自定義業務指標的設計和實現
指標數據的分析和告警規則優化
性能監控和問題定位實踐

這個階段的重點是解決網絡連通性問題，確保監控數據能穩定采集，為后續的業務監控和告警奠定基礎。

Spring boot應用監控集成

Spring Boot應用監控集成記錄

背景

初始狀態

應用信息

已有依賴

集成過程記錄

第一步：配置Spring Boot應用

基礎配置

關鍵配置說明

第二步：網絡綁定配置問題

遇到的嚴重問題

初始錯誤配置

問題分析過程

解決方案

驗證方法

第三步：Prometheus配置中的IP地址問題

遇到的核心問題

錯誤的Prometheus配置

問題根因分析

解決方案：使用公網IP

網絡架構說明

第四步：IPv4/IPv6網絡棧問題

遇到的復雜問題

問題現象

根因分析

JVM網絡棧配置問題

解決方案

參數說明

第五步：監控指標驗證

驗證metrics端點

Prometheus驗證

完整配置示例

Spring Boot配置

Prometheus配置

JVM啟動配置

網絡問題排查流程

第一層：應用層檢查

第二層：網絡連通性檢查

第三層：容器網絡檢查

踩坑總結

主要難點

關鍵經驗教訓

最佳實踐

網絡配置規范

排查工具集合

監控驗證清單

常見錯誤案例

錯誤1：只綁定localhost

錯誤2：使用容器內localhost

錯誤3：JVM參數位置錯誤

性能考慮

指標收集頻率

指標過濾優化

下一步

相關文章