系列文章目錄
k8s服務注冊到consul
prometheus監控標簽
文章目錄
- 系列文章目錄
- 前言
- 一、環境
- 二、Prometheus部署
- 1.下載
- 2.部署
- 3.驗證
- 三、kube-prometheus添加自定義監控項
- 1.準備yaml文件
- 2.創建新的secret并應用到prometheus
- 3.將yaml文件應用到集群
- 4.重啟prometheus-k8s pod
- 5.訪問Prometheus-ui
- 四、k8s中實踐基于consul的服務發現
- 1.示例nginx.yaml
- 2.創建nginx pod
- 3.檢查Prometheus Targets中是否產生了對應的job_name
- 五、告警鏈路啟動
- 1.修改alertmanager-secret.yaml文件
- 2.啟動alertWebhook pod
- 3.測試能否收到告警
- 總結
前言
在云原生技術蓬勃發展的今天,Kubernetes(K8S)已成為容器編排領域的事實標準,而監控作為保障系統穩定性和可觀測性的核心環節,其重要性不言而喻。Prometheus 憑借其強大的時序數據采集能力和靈活的查詢語言(PromQL),成為云原生監控體系的基石。然而,在動態變化的 K8S 環境中,傳統靜態配置的服務發現方式往往難以適應頻繁的服務擴縮容和實例遷移。如何實現監控目標的自動化發現與動態管理,成為提升運維效率的關鍵挑戰。
為此,服務發現技術應運而生。Consul 作為一款成熟的服務網格與分布式服務發現工具,能夠實時感知 K8S 集群中服務的注冊與健康狀態,并與 Prometheus 無縫集成,為監控系統注入動態感知能力。這種組合不僅簡化了配置復雜度,更讓監控體系具備了“自愈”和“自適應”的云原生特性。
本文將以 實戰為導向,深入剖析 K8S 環境下 Prometheus 與 Consul 的集成全流程、同時接入自研alertwebhook告警工具,涵蓋以下核心內容:1、環境架構解析:從零搭建 K8S 集群,部署 Prometheus 與 Consul 的標準化方案;2、動態服務發現:通過 Consul 自動注冊服務實例,實現 Prometheus 抓取目標的動態感知;3、配置優化實踐:揭秘 Relabel 規則、抓取策略與告警規則的進階調優技巧;4、故障排查指南:針對服務發現失效、指標抓取異常等場景,提供高效排查思路。5、告警通道配置:實現釘釘、郵箱、企業微信三個告警通知渠道。
整體架構圖如下所示
一、環境
一套最小配置的k8s1.28集群
pod自動注冊到consul <具體可看頂部文章
>
二、Prometheus部署
1.下載
代碼如下(示例):
[root@k8s-master ~]# git clone https://github.com/prometheus-operator/kube-prometheus.git
[root@k8s-master ~]# cd kube-prometheus
2.部署
[root@k8s-master ~]# kubectl apply --server-side -f manifests/setup
[root@k8s-master ~]# until kubectl get servicemonitors --all-namespaces ; do date; sleep 1; echo ""; done
[root@k8s-master ~]# kubectl apply -f manifests/
3.驗證
部署成功后,結果如下(如果部署失敗,可手動想辦法更換鏡像地址
三、kube-prometheus添加自定義監控項
1.準備yaml文件
代碼如下(示例):
[root@k8s-master prometheus]# cat prometheus-additional.yaml - job_name: 'consul-k8s' #自定義scrape_interval: 10sconsul_sd_configs:- server: 'consul-server.middleware.svc.cluster.local:8500' #consul節點的ip和svc暴露出的端口token: "9bfbe81f-2648-4673-af14-d13e0a170050" #consul的acl tokenrelabel_configs:# 1. 保留包含 "container" 標簽的服務- source_labels: [__meta_consul_tags]regex: .*container.*action: keep# 2. 設置抓取地址為服務的 ip:port- source_labels: [__meta_consul_service_address]target_label: __address__replacement: "$1:9113" #9113是nginx-exporter的端口,如果有修改自行替換# 3. 其他標簽映射(具體的consul標簽根據自己的實際環境替換,如果你使用的是頂部文章中的consul注冊工具,可以不用修改)#具體可看頂部文章prometheus監控標簽進行學習理解- source_labels: [__meta_consul_service_address]target_label: ip- source_labels: [__meta_consul_service_metadata_podPort]target_label: port- source_labels: [__meta_consul_service_metadata_project]target_label: project- source_labels: [__meta_consul_service_metadata_monitorType]target_label: monitorType- source_labels: [__meta_consul_service_metadata_hostNode]target_label: hostNode
2.創建新的secret并應用到prometheus
# 創建secret
[root@k8s-master prometheus]# kubectl create secret generic additional-scrape-configs -n monitoring --from-file=prometheus-additional.yaml --dry-run=client -o yaml > ./additional-scrape-configs.yaml# 應用到prometheus
[root@k8s-master prometheus]# kubectl apply -f additional-scrape-configs.yaml -n monitoring[root@k8s-master prometheus]# kubectl get secrets -n monitoring
NAME TYPE DATA AGE
additional-scrape-configs Opaque 1 3h18m
3.將yaml文件應用到集群
添加以下配置到文件中
[root@k8s-master prometheus]# vim manifests/prometheus-prometheus.yaml
......additionalScrapeConfigs:name: additional-scrape-configs #必須跟上述secret名稱一致key: prometheus-additional.yaml.......#應用變更到K8S生效
[root@k8s-master prometheus]# kubectl apply -f manifests/prometheus-prometheus.yaml -n monitoring
4.重啟prometheus-k8s pod
[root@k8s-master prometheus]# kubectl rollout restart -n monitoring statefulset prometheus-k8s
5.訪問Prometheus-ui
查看prometheus的target列表即可,或者prometheus–> Status–>Configuration 中可以搜到job_name為canal的配置信息
四、k8s中實踐基于consul的服務發現
準備一個nginx.yaml,結合consul的自動注冊鏡像,將其注冊到consul,然后結合所配置的consul服務發現進行pod監控
1.示例nginx.yaml
通過配置nginx自帶的stub_status模塊和nginx-exporter暴露的9113端口,實現對nginx進行監控,使其Prometheus能從http://pod Ip:9113/metrics獲取到監控數據
[root@k8s-master consul]# cat nginx.yaml
apiVersion: apps/v1
kind: Deployment
metadata:labels:run: nginxname: nginxnamespace: middleware
spec:replicas: 1selector:matchLabels:run: nginxstrategy:rollingUpdate:maxSurge: 1maxUnavailable: 0type: RollingUpdatetemplate:metadata:labels:run: nginxspec:tolerations:- key: "node-role.kubernetes.io/control-plane"operator: "Exists"effect: "NoSchedule"initContainers:- name: service-registrarimage: harbor.jdicity.local/registry/pod_registry:v14env:- name: POD_NAMEvalueFrom:fieldRef:fieldPath: metadata.name- name: POD_NAMESPACEvalueFrom:fieldRef:fieldPath: metadata.namespace- name: POD_IPvalueFrom:fieldRef:fieldPath: status.podIP- name: CONSUL_IPvalueFrom:configMapKeyRef:name: global-configkey: CONSUL_IP- name: ACL_TOKENvalueFrom:secretKeyRef:name: acl-tokenkey: ACL_TOKEN- name: NODE_NAMEvalueFrom:fieldRef:fieldPath: spec.nodeNamevolumeMounts:- mountPath: /shared-bin # 共享卷掛載到 initContainername: shared-bincommand: ["sh", "-c"]args:- |cp /usr/local/bin/consulctl /shared-bin/ &&/usr/local/bin/consulctl register \"$CONSUL_IP" \"$ACL_TOKEN" \"80" \"容器監控" \"k8s"containers:- image: swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/nginx:stableenv:- name: CONSUL_IP # 必須顯式聲明valueFrom:configMapKeyRef:name: global-configkey: CONSUL_IP- name: ACL_TOKEN # 必須顯式聲明valueFrom:secretKeyRef:name: acl-tokenkey: ACL_TOKEN- name: CONSUL_NODE_NAMEvalue: "consul-0"- name: POD_NAMEvalueFrom:fieldRef:fieldPath: metadata.namelifecycle:preStop:exec:command: ["sh", "-c", "/usr/local/bin/consulctl deregister $CONSUL_IP $ACL_TOKEN 80 $CONSUL_NODE_NAME"]imagePullPolicy: IfNotPresentname: nginxvolumeMounts:- mountPath: /usr/local/bin/consulctl # 掛載到 minio 容器的 PATH 目錄name: shared-binsubPath: consulctl- name: nginx-configmountPath: /etc/nginx/nginx.confsubPath: nginx.conflivenessProbe:httpGet:path: /port: 80initialDelaySeconds: 3periodSeconds: 3ports:- containerPort: 80- name: nginx-exporter # 容器名稱image: swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/nginx/nginx-prometheus-exporter:1.3.0args:- "--nginx.scrape-uri=http://localhost:80/stub_status" # ? 使用新參數格式ports:- containerPort: 9113restartPolicy: AlwaysterminationGracePeriodSeconds: 30volumes:- name: shared-bin # 共享卷emptyDir: {}- name: nginx-configconfigMap:name: nginx-config
configmap文件
[root@k8s-master consul]# cat nginx-config.yaml
# nginx-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:name: nginx-confignamespace: middleware
data:nginx.conf: |user nginx;worker_processes auto;error_log /var/log/nginx/error.log notice;pid /var/run/nginx.pid;events {worker_connections 1024;}http {include /etc/nginx/mime.types;default_type application/octet-stream;server {listen 80;location /stub_status {stub_status;allow 127.0.0.1;deny all;}location / {root /usr/share/nginx/html;index index.html index.htm;}}}
2.創建nginx pod
[root@k8s-master consul]# kubectl apply -f nginx-config.yaml
[root@k8s-master consul]# kubectl apply -f nginx.yaml
等待pod初始化容器啟動后,會將其注冊到consul,然后Prometheus通過配置的consul服務發現進行pod監控
3.檢查Prometheus Targets中是否產生了對應的job_name
至此,Prometheus已能成功采集到對應的監控指標數據
五、告警鏈路啟動
alertwebhook源碼地址: https://gitee.com/wd_ops/alertmanager-webhook_v2
包含了源碼、鏡像構建、啟動alertwebhook的yaml文件、告警實現架構圖,再此不過多描述
1.修改alertmanager-secret.yaml文件
自己寫的alertWebHook工具,實現了基于郵件、釘釘、企業微信三種方式的告警發送渠道
[root@k8s-master manifests]# cat alertmanager-secret.yaml
apiVersion: v1
kind: Secret
metadata:name: alertmanager-mainnamespace: monitoring
stringData:alertmanager.yaml: |-global:resolve_timeout: 5mroute:group_by: ['alertname']group_interval: 10sgroup_wait: 10sreceiver: 'webhook'repeat_interval: 5mreceivers:- name: 'webhook'webhook_configs:- "url": "http://alertmanager-webhook.monitoring.svc.cluster.local:19093/api/v1/wechat"- "url": "http://alertmanager-webhook.monitoring.svc.cluster.local:19093/api/v1/email"- "url": "http://alertmanager-webhook.monitoring.svc.cluster.local:19093/api/v1/dingding"
type: Opaque[root@k8s-master manifests]# kubectl apply -f alertmanager-secret.yaml
2.啟動alertWebhook pod
關于下方的郵件、釘釘、企業微信的key、secret等密鑰自行百度官網文檔獲取,不過多描述
[root@k8s-master YamlTest]# cat alertWebhook.yaml
apiVersion: apps/v1
kind: Deployment
metadata:name: alertmanager-webhooknamespace: monitoring # 建議根據實際需求選擇命名空間labels:app: alertmanager-webhook
spec:replicas: 1selector:matchLabels:app: alertmanager-webhooktemplate:metadata:labels:app: alertmanager-webhookspec:containers:- name: webhookimage: harbor.jdicity.local/registry/alertmanager-webhook:v4.0imagePullPolicy: IfNotPresentports:- containerPort: 19093protocol: TCPresources:requests:memory: "256Mi"cpu: "50m"limits:memory: "512Mi"cpu: "100m"volumeMounts:- name: logsmountPath: /export/alertmanagerWebhook/logs- name: configmountPath: /export/alertmanagerWebhook/settings.yamlsubPath: settings.yamlvolumes:- name: logsemptyDir: {}- name: configconfigMap:name: alertmanager-webhook-config---
# 配置文件通過ConfigMap管理(推薦)
apiVersion: v1
kind: ConfigMap
metadata:name: alertmanager-webhook-confignamespace: monitoring
data:settings.yaml: |DingDing:enabled: falsedingdingKey: "9zzzzc39"signSecret: "SEzzzff859a7b"chatId: "chat3zz737e49beb9"atMobiles: - "14778987659"- "17657896784"QyWeChat:enabled: trueqywechatKey: "4249406zz305"corpID: "ww4zzz7b"corpSecret: "mM23zOozwEZM"atMobiles: - "14778987659"Email:enabled: truesmtp_host: "smtp.163.com"smtp_port: 25smtp_from: "rzzxd@163.com"smtp_password: "UzzH"smtp_to: "1zz030@qq.com"Redis:redisServer: "redis-master.redis.svc.cluster.local"mode: "master-slave" # single/master-slave/clusterredisPort: "6379" # 主節點端口redisPassword: "G0LzzW"requirePassword: true# 主從模式配置slaveNodes:- "redis-slave.redis.svc.cluster.local:6379"# 集群模式配置clusterNodes:- "192.168.75.128:7001"- "192.168.75.128:7002"- "192.168.75.128:7003"System:projectName: "測試項目"prometheus_addr: "prometheus-k8s.monitoring.svc.cluster.local:9090"host: 0.0.0.0port: 19093env: releaselogFileDir: /export/alertmanagerWebhook/logs/logFilePath: alertmanager-webhook.loglogMaxSize: 100logMaxBackup: 5logMaxDay: 30
---
# 新增 Service 配置
apiVersion: v1
kind: Service
metadata:name: alertmanager-webhooknamespace: monitoringlabels:app: alertmanager-webhook
spec:type: ClusterIP # 默認類型,集群內訪問selector:app: alertmanager-webhook # 必須與 Deployment 的 Pod 標簽匹配ports:- name: httpport: 19093 # Service 暴露的端口targetPort: 19093 # 對應容器的 containerPortprotocol: TCP
3.測試能否收到告警
當前k8s集群存在告警,看是否能收到告警通知
啟動alertWebhook
[root@k8s-master YamlTest]# kubectl apply -f alertWebhook.yaml
deployment.apps/alertmanager-webhook created
configmap/alertmanager-webhook-config created
service/alertmanager-webhook created
郵件部分日志示例
釘釘
企業微信
郵箱
該處使用的url網絡請求的數據。
總結
至此一套完整的開源的監控注冊、監控告警方案成功落地完成!!!