Prometheus Operator創建告警規則并接入釘釘報警

prometheus之釘釘報警

前言
1. 添加prometheus報警規則
- 1.2 添加自定義報警規則文件
2. 配置釘釘報警
- 2.2 部署dingding插件
3. 編寫alertmanager配置文件

前言

在kubenetes上安裝了kube-promethues（包含Prometheus Operator）,程序正常跑起來了，主要是不知道如何配置規則文件和配置接收消息的對象。

1. 添加prometheus報警規則

kubectl get Prometheus -n 指定的命名空間（即Prometheus Operator安裝的命名空間）

root@master2:~/dingtalk# kubectl get prometheus -n monitoring
NAME   VERSION   DESIRED   READY   RECONCILED   AVAILABLE   AGE
k8s    2.52.0    2         2       True         True        23h

查看詳細內容

root@master2:~/dingtalk# kubectl get prometheus -n monitoring k8s -oyaml 
...ruleNamespaceSelector: {}ruleSelector:matchLabels:prometheus: k8srole: alert-rulesscrapeConfigNamespaceSelector: {}scrapeConfigSelector: {}...

里面有個重要的內容，即matchLables，所有規則文件都需要有"prometheus:k8s"和"role:alert-rules"，這樣operator才會發現并自動創建。非常重要

1.2 添加自定義報警規則文件

vim bm-custom-rules.yaml

內容如下，需要注意label：

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:labels:Prometheus: k8sRole: alert-rulesname: prometheus-k8s-rules-wzqnamespace: monitoring
spec:groups: - name: 主機狀態-監控告警rules:- alert: 節點內存expr: (1 - (node_memory_MemAvailable_bytes / (node_memory_MemTotal_bytes)))* 100 > 85for: 1mlabels:severity: warningannotations:summary: "內存使用率過高！"description: "節點{{$labels.instance}} 內存使用大于85%(目前使用:{{$value}}%)"- alert: 節點TCP會話expr: node_netstat_Tcp_CurrEstab > 1000for: 1mlabels:severity: warningannotations:summary: "TCP_ESTABLISHED過高！"description: "{{$labels.instance }} TCP_ESTABLISHED大于1000%(目前使用:{{$value}}%)"- alert: 節點磁盤容量expr: max((node_filesystem_size_bytes{fstype=~"ext.?|xfs"}-node_filesystem_free_bytes{fstype=~"ext.?|xfs"}) *100/(node_filesystem_avail_bytes {fstype=~"ext.?|xfs"}+(node_filesystem_size_bytes{fstype=~"ext.?|xfs"}-node_filesystem_free_bytes{fstype=~"ext.?|xfs"})))by(instance) > 80for: 1mlabels:severity: warningannotations:summary: "節點磁盤分區使用率過高！"description: "{{$labels.instance }} 磁盤分區使用大于80%(目前使用:{{$value}}%)"- alert: 節點CPUexpr: (100 - (avg by (instance) (irate(node_cpu_seconds_total{job=~".*",mode="idle"}[5m])) * 100)) > 85for: 1mlabels:severity: warningannotations:summary: "節點CPU使用率過高！"description: "{{$labels.instance }} CPU使用率大于80%(目前使用:{{$value}}%)"- alert: 節點存活expr: up{job='node-exporter'} != 1for: 1mlabels:severity: warningannotations:summary: "該節點已宕機"description: "{{$labels.instance }} 機器以宕機(目前使用:{{$value}}%)"

保存，并使用命令創建：

kubectl create -f bm-custom-rules.yaml

創建成功后，使用命令查詢是否創建成功：

kubectl get PrometheusRule -n monitoring

在這里插入圖片描述
已經創建成功,進入容器查看

kubectl exec -it prometheus-k8s-0 /bin/sh -n 命名空間
cd /etc/prometheus/rules/prometheus-k8s-rulefiles-0/
ls

在這里插入圖片描述
到這里 prometheus 的自定義規則部署完成，接下來準備接入釘釘并完成報警

2. 配置釘釘報警

首先在釘釘群里配置釘釘機器人獲取到token及secret

alertmanager 的 receive 并不直接支持釘釘的 url，要部署插件容器 prometheus-webhook-dingtalk

并且有個需要注意的地方是，當 receives 為釘釘時 (webhook_configs)，它的告警模板不是在 alertmanager 的配置文件中指定的，而是在釘釘插件 prometheus-webhook-dingtalk 中指定的。

編寫 prometheus-webhook-dingtalk 配置文件和模板

vim dingtalk-configmap.yaml，這里記的替換你的釘釘 url token。

apiVersion: v1
kind: ConfigMap
metadata:name: prometheus-webhook-dingtalk-confignamespace: monitoring
data:config.yml: |-templates:- /etc/prometheus-webhook-dingtalk/default.tmpltargets:webhook1:url: https://oapi.dingtalk.com/robot/send?access_token=1f315a3d3b68ae9a5df0f6cde411902c493a10bc3d6ed6bbba8cd8b4bcd1c848secret: SEC4d160d1d987b58a19e9a825b83715b253d0b6d0c255b5abb28c265798c535b7emessage:text: '{{ template "default.tmpl" . }}'default.tmpl: |{{ define "default.tmpl" }}{{- if gt (len .Alerts.Firing) 0 -}}{{- range $index, $alert := .Alerts -}}============ = **<font color='#FF0000'>告警</font>** = =============  **告警名稱:**    {{ $alert.Labels.alertname }}   **告警級別:**    {{ $alert.Labels.severity }} 級   **告警狀態:**    {{ .Status }}   **告警實例:**    {{ $alert.Labels.instance }} {{ $alert.Labels.device }}   **告警概要:**    {{ .Annotations.summary }}   **告警詳情:**    {{ $alert.Annotations.message }}{{ $alert.Annotations.description}}   **故障時間:**    {{ ($alert.StartsAt.Add 28800e9).Format "2006-01-02 15:04:05" }}  ============ = end = =============  {{- end }}{{- end }}{{- if gt (len .Alerts.Resolved) 0 -}}{{- range $index, $alert := .Alerts -}}============ = <font color='#00FF00'>恢復</font> = =============   **告警實例:**    {{ .Labels.instance }}   **告警名稱:**    {{ .Labels.alertname }}  **告警級別:**    {{ $alert.Labels.severity }} 級   **告警狀態:**    {{   .Status }} **告警概要:**    {{ $alert.Annotations.summary }}  **告警詳情:**    {{ $alert.Annotations.message }}{{ $alert.Annotations.description}}  **故障時間:**    {{ ($alert.StartsAt.Add 28800e9).Format "2006-01-02 15:04:05" }}  **恢復時間:**    {{ ($alert.EndsAt.Add 28800e9).Format "2006-01-02 15:04:05" }}  ============ = **end** = ============={{- end }}{{- end }}{{- end }}

然后創建 configmap

kubectl apply -f dingtalk-configmap.yaml

2.2 部署dingding插件

不同版本的插件指定配置文件的參數也不一樣，這里部署的是 v2.1.0

vim dingtalk-webhook-deploy.yaml，這個文件不需要修改

apiVersion: v1
kind: Service
metadata:name: dingtalknamespace: monitoringlabels:app: dingtalk
spec:selector:app: dingtalkports:- name: dingtalkport: 8060protocol: TCPtargetPort: 8060---
apiVersion: apps/v1
kind: Deployment
metadata:name: dingtalknamespace: monitoring
spec:replicas: 2selector:matchLabels:app: dingtalktemplate:metadata:name: dingtalklabels:app: dingtalkspec:containers:- name: dingtalkimage: timonwong/prometheus-webhook-dingtalk:v2.1.0imagePullPolicy: IfNotPresentargs:- --web.listen-address=:8060- --config.file=/etc/prometheus-webhook-dingtalk/config.ymlports:- containerPort: 8060volumeMounts:- name: configmountPath: /etc/prometheus-webhook-dingtalkvolumes:- name: configconfigMap:name: prometheus-webhook-dingtalk-config

kubectl apply -f dingtalk-webhook-deploy.yaml

3. 編寫alertmanager配置文件

此文件需要自己添加一條路由或用默認路由，和相應的接收者。

這里的接收者 webhook，其實是上面部署的釘釘插件 service 的地址

global:resolve_timeout: 1msmtp_smarthost: 'smtp.qq.com:25'smtp_from: '888888@qq.com'smtp_auth_username: '88888888@qq.com'smtp_auth_password: 'xxxxZXXX'smtp_require_tls: falseroute:group_by: ['alertname','job']group_wait: 30sgroup_interval: 10srepeat_interval: 30mreceiver: 'webhook'routes:- match:app: 'dingtalk'receiver: 'webhook'receivers:
- name: 'webhook'webhook_configs:# 和插件不同 namespace 請填寫 http://webhook-dingtalk.monitoring.svc.cluster.local:8060/dingtalk/webhook1/send- url: 'http://dingtalk:8060/dingtalk/webhook1/send'send_resolved: true

先將之前的 secret 對象刪除

kubectl delete secret alertmanager-main -n monitoring
secret "alertmanager-main" deleted

kubectl create secret generic alertmanager-main --from-file=alertmanager.yaml -n monitoring
secret "alertmanager-main" created

刪除alertmanager 的pod
等待一段時間然后查看alermanager容器的日志有無報錯

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/bicheng/17679.shtml
繁體地址，請注明出處：http://hk.pswp.cn/bicheng/17679.shtml
英文地址，請注明出處：http://en.pswp.cn/bicheng/17679.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！