一、安裝prometheus
1.1 安裝
prometheus官網下載地址
sudo -i
mkdir -p /opt/prometheus
#移動解壓后的文件名到/opt/,并改名prometheus
mv prometheus-2.45 /opt/prometheus/
#創建一個專門的prometheus用戶: -M 不創建家目錄, -s 不讓登錄
useradd -M -s /usr/sbin/nologin prometheus##更改prometheus用戶的文件夾權限:
chown prometheus:prometheus -R /opt/prometheus
1.2 修改配置
global:scrape_interval: 30s # Set the scrape interval to every 15 seconds. Default is every 1 minute.evaluation_interval: 30s # Evaluate rules every 15 seconds. The default is every 1 minute.# 其他全局配置...scrape_configs:# Prometheus 自身的監控配置- job_name: "prometheus"static_configs:- targets: ["localhost:9070"]- job_name: 'node_widgets'scheme: https # 使用 HTTPStls_config:insecure_skip_verify: true # 忽略證書驗證static_configs:- targets: ['xxxxx.xxx.com:443'] # 替換為您的服務器 B 地址和端口metrics_path: '/prometheus/metrics' # Node Exporter 的路徑
如果修改了配置可以驗證配置
./promtool check config new_prometheus.yml
熱更新
curl -X POST http://localhost:9070/-/reload
1.3 配置自啟動
vim /etc/systemd/system/prometheus.service
寫入數據
[Unit]
Description=Prometheus Server
After=network-online.target
[Service]
Type=simple
User=prometheus
Group=prometheus
Restart=on-failure
WorkingDirectory=/opt/prometheus/prometheus-2.45
ExecStart=/opt/prometheus/prometheus-2.45/prometheus --web.listen-address ":9070" --config.file /opt/prometheus/prometheus-2.45/new_prometheus.yml --storage.tsdb.path /opt/prometheus/prometheus-2.45/data --storage.tsdb.retention.time=20d --web.enable-lifecycle
[Install]
WantedBy=multi-user.target
開機自啟動
sudo systemctl daemon-reload
sudo systemctl enable prometheus
sudo systemctl restart prometheus
sudo systemctl status prometheus
二、安裝node_exporter
2.1 官網下載地址 https://prometheus.io/download/
2.2 開機自啟動
添加
sudo vim /etc/systemd/system/node_exproter.service
[Unit]
Description=node_exporter
Documentation=https://prometheus.io/
After=network.target
[Service]
User=ubuntu
Group=ubuntu
ExecStart=/opt/prometheus/node_exproter-1.7.0/node_exporter --web.listen-address=":9101"
Restart=on-failure
[Install]
WantedBy=multi-user.target
sudo systemctl daemon-reload
sudo systemctl enable node_exproter
sudo systemctl restart node_exproter
sudo systemctl status node_exproter
三、安裝grafana
3.1 官網下載地址 https://grafana.com/grafana/download?edition=oss&platform=linux
3.2 添加開機啟動
添加service
sudo vim /etc/systemd/system/grafana.service
[Unit]
Description=Grafana server
Documentation=http://docs.grafana.org
[Service]
Type=simple
User=prometheus
Group=prometheus
Restart=on-failure
ExecStart=/opt/prometheus/grafana-v10.2.2/bin/grafana-server \--config=/opt/prometheus/grafana-v10.2.2/conf/grafana.ini \--homepath=/opt/prometheus/grafana-v10.2.2 \--http-port=3000
[Install]
WantedBy=multi-user.target
3.2 修改 grafana.init 郵箱配置
[smtp]
enabled = true
host = smtp.gmail.com:587
user = xuzan@lippu.ltd
# If the password contains # or ; you have to wrap it with triple quotes. Ex """#password;"""
password = 授權碼
cert_file =
key_file =
skip_verify = true
from_address = xuzan@lippu.ltd
from_name = Grafana
ehlo_identity =
startTLS_policy =
sudo systemctl daemon-reload
sudo systemctl enable grafana
sudo systemctl restart grafana
sudo systemctl status grafana
四、alertmanager 安裝
4.1 安裝官網地址 https://prometheus.io/download/
4.2 新增啟動項
編輯
sudo vim /etc/systemd/system/alertmanager.service
[Unit]
Description=Alert Manager
Wants=network-online.target
After=network-online.target[Service]
Type=simple
User=prometheus
Group=prometheus
ExecStart=/opt/prometheus/alertmanager-0.26.0/alertmanager \--config.file=/opt/prometheus/alertmanager-0.26.0/alertmanager.yml \--storage.path=/opt/prometheus/alertmanager-0.26.0/data \--web.listen-address=:9071 \--cluster.listen-address=:9072Restart=always[Install]
WantedBy=multi-user.target
sudo systemctl daemon-reload
sudo systemctl enable alertmanager
sudo systemctl restart alertmanager
sudo systemctl status alertmanager -l
4.3 使用了prometheus 需要修改 new_prometheus.yml
新增
# Alertmanager configuration
alerting:alertmanagers:- static_configs:- targets:- alertmanager:9071# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:- "alert.yml"
在項目更目錄下新增 新增alert.yml
groups:
- name: Prometheus alertrules:# 對任何實例超過30s無法聯系的情況發出警報- alert: 服務告警expr: up == 0for: 30slabels:severity: criticalannotations:instance: "{{ $labels.instance }}"description: "{{ $labels.job }} 服務已關閉"
具體告警規則:alert: 這是告警的名稱,在這個例子中命名為 "服務告警"。
expr: 這是觸發告警的表達式。在這個例子中,表達式 up == 0 檢查 up 指標是否等于 0。up 指標是 Prometheus 用來表示目標實例可達性的標準指標,其中 0 表示不可達,1 表示可達。
for: 這個條件指定了在觸發告警之前必須滿足告警條件的持續時間。在這里設置為 30s,意味著只有當 up 指標持續為 0 超過 30 秒時,才會觸發告警。
labels: 這部分定義了附加到告警上的標簽。在這個例子中,它設置了一個嚴重性標簽(severity: critical),表示這是一個嚴重的告警。
annotations:
這部分提供了關于告警的更多信息,通常用于在告警通知中顯示。在這個例子中,它包括兩個注解:
instance: "{{ $labels.instance }}":這將顯示觸發告警的實例。
description: "{{ $labels.job }} 服務已關閉":這提供了一個描述性的消息,指出哪個服務(job)已經關閉。
4.4 驗證配置
./promtool check config new_prometheus.yml
重新熱加載配置
curl -X POST http://localhost:9070/-/reload
五、grafana 添加數據源
5.1 添加數據來源
這里填寫prometheus 的數據源地址,因為grafana 和 prometheus 放到一臺服務器上了,所以我填寫的是localhost
5.2 添加dashboards,添加地址: https://grafana.com/grafana/dashboards/
選擇一個dashborads
copy dashborads 的ID
在grafana 界面導入dashborad ,可以通過ID導入
最后選擇剛剛的數據源
最終顯示
六、配置gmail告警
6.1 打開gmail 配置
點擊查看所有配置
最后保存
6.2 還需要打開二次驗證生成授權碼
打開專用密碼
創建一個自定義名稱的應用,這個授權碼,就是發送郵箱設置的密碼,會隨機生成一段字符串
七、通過alertmanager 配置郵箱告警
7.1 alertmanager.yml配置
global:# 全局配置smtp_smarthost: 'smtp.gmail.com:587' # 指定SMTP服務器和端口,這里使用的是Gmail的SMTP服務器和587端口smtp_from: 'xuzan@lippu.ltd' # 發送告警郵件時使用的發件人郵箱地址smtp_auth_username: 'xuzan@lippu.ltd' # SMTP認證時使用的用戶名,這里是郵箱地址smtp_auth_password: '二次認證上面生成auth密碼' # SMTP認證時使用的密碼route:# 路由配置group_by: ['critical','warning'] # 告警分組依據,這里按照 'server_alert' 標簽分組group_wait: 30s # 分組后等待30秒,如果這段時間內有新的相同分組的告警則一起發送group_interval: 5m # 分組告警發送間隔,即每5分鐘發送一次同一組的告警repeat_interval: 1h # 重復告警發送間隔,即相同的告警每小時重復發送一次receiver: 'email-notifications' # 默認接收器,用于處理沒有匹配特定路由的告警routes:- match:severity: 'warning' # 匹配規則,當告警級別為warning時receiver: 'email-notifications' # 使用此接收器處理告警group_by: ['warning'] # 告警分組依據,這里按照 'warning' 標簽分組- match:severity: 'critical' # 匹配規則,當告警級別為critical時receiver: 'email-notifications' # 使用此接收器處理告警group_by: ['critical'] # 告警分組依據,這里按照 'critical' 標簽分組group_wait: 10s # 分組后等待10秒,如果這段時間內有新的相同分組的告警則一起發送
receivers:# 接收器定義- name: 'email-notifications' # 接收器名稱email_configs:- to: 'xuzan@lippu.ltd' # 告警接收的郵箱地址send_resolved: true # 告警解決后是否發送通知
7.2 prometheus.yml 修改
global:scrape_interval: 30s # Set the scrape interval to every 15 seconds. Default is every 1 minute.evaluation_interval: 30s # Evaluate rules every 15 seconds. The default is every 1 minute.# 其他全局配置...scrape_configs:# Prometheus 自身的監控配置- job_name: "prometheus"static_configs:- targets: ["localhost:9070"]- job_name: 'node_widgets'scheme: https # 使用 HTTPStls_config:insecure_skip_verify: true # 忽略證書驗證static_configs:- targets: ['xxx:443'] # 替換為您的服務器 B 地址和端口metrics_path: '/prometheus/metrics' # Node Exporter 的路徑- job_name: '正式服'scheme: https # 使用 HTTPStls_config:insecure_skip_verify: true # 忽略證書驗證static_configs:- targets: ['xxx.com:443'] # 替換為您的服務器 B 地址和端口metrics_path: '/v1/app/metrics' # Node Exporter 的路徑- job_name: '測式服'scheme: http # 使用 HTTPSstatic_configs:- targets: ['23.8323.373.2437:8063'] # 替換為您的服務器 B 地址和端口metrics_path: '/v1/app/metrics' # Node Exporter 的路徑# Alertmanager configuration
alerting:alertmanagers:- static_configs:- targets:- localhost:9071# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:- "alert.yml"# 測試的#- "test_alert.yml"
7.3 報警配置
alert.yml 配置服務是否down機,內存,cpu和 磁盤使用告警
groups:- name: Prometheus alertrules:# 對任何實例超過30s無法聯系的情況發出警報- alert: 服務是否down機expr: up == 0for: 30slabels:severity: criticalannotations:instance: "{{ $labels.instance }}"description: "{{ $labels.job }} 服務down機了,緊急查看"# 內存使用率超過 80% 的告警- alert: 內存使用情況expr: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100 > 80for: 30slabels:severity: warningannotations:instance: "{{ $labels.instance }}"description: "內存使用率超過 80%,當前值:{{ $value }}%"# CPU 使用率超過 80% 的告警- alert: cpu使用情況expr: 100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80for: 30slabels:severity: criticalannotations:instance: "{{ $labels.instance }}"description: "CPU 使用率超過 80%,當前值:{{ $value }}%"# 磁盤使用率超過 80% 的告警- alert: 磁盤使用情況expr: (node_filesystem_size_bytes - node_filesystem_free_bytes) / node_filesystem_size_bytes * 100 > 80for: 30slabels:severity: warningannotations:instance: "{{ $labels.instance }}"description: "磁盤使用率超過 80%,當前值:{{ $value }}%"# - name: myapp_alerts_down
# rules:
# - alert: 程序掛掉了(緊急查看)
# expr: myapp_up == 1
# for: 30s
# labels:
# severity: warning
# annotations:
# summary: "程序掉了 down,超過了30s了"
# description: "instance {{ $labels.instance }} with job {{ $labels.job }}"
7.4 測試報警 test.alert
groups:- name: test_alertsrules:- alert: TestAlertexpr: vector(1)labels:severity: warning
7.5 告警模版也可以自定義,下面這個是官方的模版
https://raw.githubusercontent.com/prometheus/alertmanager/master/template/email.html
<!--
Style and HTML derived from https://github.com/mailgun/transactional-email-templatesThe MIT License (MIT)Copyright (c) 2014 MailgunPermission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
-->
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="viewport" content="width=device-width" />
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title>{{ template "__subject" . }}</title>
<style>
/* -------------------------------------GLOBALA very basic CSS reset
------------------------------------- */
* {margin: 0;font-family: "Helvetica Neue", Helvetica, Arial, sans-serif;box-sizing: border-box;font-size: 14px;
}img {max-width: 100%;
}body {-webkit-font-smoothing: antialiased;-webkit-text-size-adjust: none;width: 100% !important;height: 100%;line-height: 1.6em;/* 1.6em * 14px = 22.4px, use px to get airier line-height also in Thunderbird, and Yahoo!, Outlook.com, AOL webmail clients *//*line-height: 22px;*/
}/* Let's make sure all tables have defaults */
table td {vertical-align: top;
}/* -------------------------------------BODY & CONTAINER
------------------------------------- */
body {background-color: #f6f6f6;
}.body-wrap {background-color: #f6f6f6;width: 100%;
}.container {display: block !important;max-width: 600px !important;margin: 0 auto !important;/* makes it centered */clear: both !important;
}.content {max-width: 600px;margin: 0 auto;display: block;padding: 20px;
}/* -------------------------------------HEADER, FOOTER, MAIN
------------------------------------- */
.main {background-color: #fff;border: 1px solid #e9e9e9;border-radius: 3px;
}.content-wrap {padding: 30px;
}.content-block {padding: 0 0 20px;
}.header {width: 100%;margin-bottom: 20px;
}.footer {width: 100%;clear: both;color: #999;padding: 20px;
}
.footer p, .footer a, .footer td {color: #999;font-size: 12px;
}/* -------------------------------------TYPOGRAPHY
------------------------------------- */
h1, h2, h3 {font-family: "Helvetica Neue", Helvetica, Arial, "Lucida Grande", sans-serif;color: #000;margin: 40px 0 0;line-height: 1.2em;font-weight: 400;
}h1 {font-size: 32px;font-weight: 500;/* 1.2em * 32px = 38.4px, use px to get airier line-height also in Thunderbird, and Yahoo!, Outlook.com, AOL webmail clients *//*line-height: 38px;*/
}h2 {font-size: 24px;/* 1.2em * 24px = 28.8px, use px to get airier line-height also in Thunderbird, and Yahoo!, Outlook.com, AOL webmail clients *//*line-height: 29px;*/
}h3 {font-size: 18px;/* 1.2em * 18px = 21.6px, use px to get airier line-height also in Thunderbird, and Yahoo!, Outlook.com, AOL webmail clients *//*line-height: 22px;*/
}h4 {font-size: 14px;font-weight: 600;
}p, ul, ol {margin-bottom: 10px;font-weight: normal;
}
p li, ul li, ol li {margin-left: 5px;list-style-position: inside;
}/* -------------------------------------LINKS & BUTTONS
------------------------------------- */
a {color: #348eda;text-decoration: underline;
}.btn-primary {text-decoration: none;color: #FFF;background-color: #348eda;border: solid #348eda;border-width: 10px 20px;line-height: 2em;/* 2em * 14px = 28px, use px to get airier line-height also in Thunderbird, and Yahoo!, Outlook.com, AOL webmail clients *//*line-height: 28px;*/font-weight: bold;text-align: center;cursor: pointer;display: inline-block;border-radius: 5px;text-transform: capitalize;
}/* -------------------------------------OTHER STYLES THAT MIGHT BE USEFUL
------------------------------------- */
.last {margin-bottom: 0;
}.first {margin-top: 0;
}.aligncenter {text-align: center;
}.alignright {text-align: right;
}.alignleft {text-align: left;
}.clear {clear: both;
}/* -------------------------------------ALERTSChange the class depending on warning email, good email or bad email
------------------------------------- */
.alert {font-size: 16px;color: #fff;font-weight: 500;padding: 20px;text-align: center;border-radius: 3px 3px 0 0;
}
.alert a {color: #fff;text-decoration: none;font-weight: 500;font-size: 16px;
}
.alert.alert-warning {background-color: #E6522C;
}
.alert.alert-bad {background-color: #D0021B;
}
.alert.alert-good {background-color: #68B90F;
}/* -------------------------------------INVOICEStyles for the billing table
------------------------------------- */
.invoice {margin: 40px auto;text-align: left;width: 80%;
}
.invoice td {padding: 5px 0;
}
.invoice .invoice-items {width: 100%;
}
.invoice .invoice-items td {border-top: #eee 1px solid;
}
.invoice .invoice-items .total td {border-top: 2px solid #333;border-bottom: 2px solid #333;font-weight: 700;
}/* -------------------------------------RESPONSIVE AND MOBILE FRIENDLY STYLES
------------------------------------- */
@media only screen and (max-width: 640px) {body {padding: 0 !important;}h1, h2, h3, h4 {font-weight: 800 !important;margin: 20px 0 5px !important;}h1 {font-size: 22px !important;}h2 {font-size: 18px !important;}h3 {font-size: 16px !important;}.container {padding: 0 !important;width: 100% !important;}.content {padding: 0 !important;}.content-wrap {padding: 10px !important;}.invoice {width: 100% !important;}
}
</style>
</head><body itemscope itemtype="https://schema.org/EmailMessage"><table class="body-wrap"><tr><td></td><td class="container" width="600"><div class="content"><table class="main" width="100%" cellpadding="0" cellspacing="0"><tr>{{ if gt (len .Alerts.Firing) 0 }}<td class="alert alert-warning">{{ .Alerts | len }} alert{{ if gt (len .Alerts) 1 }}s{{ end }} for {{ range .GroupLabels.SortedPairs }}{{ .Name }}={{ .Value }}{{ end }}</td>{{ else }}<td class="alert alert-good">{{ .Alerts | len }} alert{{ if gt (len .Alerts) 1 }}s{{ end }} for {{ range .GroupLabels.SortedPairs }}{{ .Name }}={{ .Value }} {{ end }}</td>{{ end }}</tr><tr><td class="content-wrap"><table width="100%" cellpadding="0" cellspacing="0"><tr><td class="content-block"><a href='{{ template "__alertmanagerURL" . }}' class="btn-primary">View in {{ template "__alertmanager" . }}</a></td></tr>{{ if gt (len .Alerts.Firing) 0 }}<tr><td class="content-block"><strong>[{{ .Alerts.Firing | len }}] Firing</strong></td></tr>{{ end }}{{ range .Alerts.Firing }}<tr><td class="content-block"><strong>Labels</strong><br />{{ range .Labels.SortedPairs }}{{ .Name }} = {{ .Value }}<br />{{ end }}{{ if gt (len .Annotations) 0 }}<strong>Annotations</strong><br />{{ end }}{{ range .Annotations.SortedPairs }}{{ .Name }} = {{ .Value }}<br />{{ end }}<a href="{{ .GeneratorURL }}">Source</a><br /></td></tr>{{ end }}{{ if gt (len .Alerts.Resolved) 0 }}{{ if gt (len .Alerts.Firing) 0 }}<tr><td class="content-block"><br /><hr /><br /></td></tr>{{ end }}<tr><td class="content-block"><strong>[{{ .Alerts.Resolved | len }}] Resolved</strong></td></tr>{{ end }}{{ range .Alerts.Resolved }}<tr><td class="content-block"><strong>Labels</strong><br />{{ range .Labels.SortedPairs }}{{ .Name }} = {{ .Value }}<br />{{ end }}{{ if gt (len .Annotations) 0 }}<strong>Annotations</strong><br />{{ end }}{{ range .Annotations.SortedPairs }}{{ .Name }} = {{ .Value }}<br />{{ end }}<a href="{{ .GeneratorURL }}">Source</a><br /></td></tr>{{ end }}</table></td></tr></table><div class="footer"><table width="100%"><tr><td class="aligncenter content-block"><a href='{{ .ExternalURL }}'>Sent by {{ template "__alertmanager" . }}</a></td></tr></table></div></div></td><td></td></tr>
</table></body>
</html>