十七、K8s 可觀測性:全鏈路追蹤
文章目錄
- 十七、K8s 可觀測性:全鏈路追蹤
- 1、Skywalking 初識
- 1.1 為什么需要全鏈路追蹤平臺
- 1.2 全鏈路追蹤核心組件及工作原理
- 1.2.1 全鏈路追蹤核心概念
- 1.2.2 全鏈路追蹤工作原理
- 1.3 什么是Skywalking?
- 1.4 Skywalking架構解析
- 1.5 Skywalking核心術語和名詞
- 2、Skywalking 集群安裝
- 2.1 集群規劃
- 2.2 Skywalking 集群安裝
- 2.3 Java 服務接入 Skywalking
- 2.4 Go 服務接入 Skywalking
- 2.5 清理環境
- 3、全鏈路追蹤項目練習
- 3.1 服務部署
- 3.1.1 部署數據庫(延用上個實驗配置)
- 3.1.2 啟動 order 服務
- 3.1.3 部署 handler 服務(延用上個實驗配置)
- 3.1.4 部署 receive 服務
- 3.1.5 部署前端服務
- 3.2 服務訪問與監控
- 3.3 模擬故障
- 4、Skywalking 告警
- 4.1 Skywalking 告警通知
- 4.2 Skywalking 告警規則
- 4.3 釘釘告警機器人配置
- 4.4 Skywalking 接入釘釘告警
- 4.5 自定義告警規則
1、Skywalking 初識
1.1 為什么需要全鏈路追蹤平臺
- 快速定位故障點
- 快速定位性能依賴關系
- 理解服務依賴關系
- 全局流量可視化
1.2 全鏈路追蹤核心組件及工作原理
1.2.1 全鏈路追蹤核心概念
- Trace:一個請求的完整操作過程被稱作一個Trace,代表從客戶端發起請求到后端完全處理到整個過程,一個trace由多個span組成。
- Span:一個Span表示Trace中的一部分工作,可以理解為一次函數調用或者是一個HTTP請求。每個Span都包含了操作名稱、開始時間、結束時間以及操作相關的元數據等信息。Span具有上下級關系(父子關系),同時多個Span的結合就表達了一次Trace。
- Trace ID 和 Span ID:每個Trace都有一個唯一的 Trace ID,每一個Span都有一個唯一的 Span ID,并且還包含了指向父級Span的引用。
1.2.2 全鏈路追蹤工作原理
1、客戶端發起請求
2、服務A開始處理請求并創建初始Trace和Span
3、服務A將請求轉發給服務B,同時傳遞 race ID 和 Span ID
4、服務B根據傳遞的信息繼續創建新的Span,并標記父Span
5、所有服務處理完成后,各自產生的Span數據都會發送至追蹤平臺進行匯總
6、用戶可以通過UI查看整個Trace的詳細信息
1.3 什么是Skywalking?
Skywalking是一個針對分布式系統的應用性能監控(Application Performance Monitor, APM)和可觀測性分析平臺(Observability Analysis Platform)。Skywalking提供了包括分布式追蹤、指標監控、故障診斷信息、服務網格遙測分析、異常告警以及可視化界面等功能,可幫助開發人員和運維團隊更好地理解和管理應用和服務。
核心特性:
- 分布式追蹤:Skywalking可以為請求生成跟蹤數據,能夠幫助用戶了解整個調用鏈路的情況,從而定位性能瓶頸或問題根源
- 度量分析:支持對服務的健康狀況進行度量分析,如響應時間、吞吐量、成功率等關鍵性能指標(KPI)
- 告警機制:支持自定義規則告警,當檢測到異常情況時自動發送告警通知
- 豐富的UI界面:提供了直觀易用的Web UI,方便用戶查看追蹤數據、監控指標及服務拓撲結構等
- 低侵入性:通過字節碼注入的方式實現代碼級別的監控,無需修改業務邏輯即可完成接入
- 多語言支持:除了Java之外,還支持.NET Core、Node.js、Python、Go等多種編程語言,滿足不同開發環境的需求
- 多平臺集成:支持與服務網格、Kubernetes集成
1.4 Skywalking架構解析
1.5 Skywalking核心術語和名詞
- Service:Service指的是一個或一組提供相同功能或業務邏輯的應用。可以是一個微服務、一個web服務、一個數據庫或者其他類型的后端服務
- Instance:Instance是指服務的一個具體運行實例。在一個分布式環境種,同一個服務可能部署在多個不同的服務器或者容器上,每個容器或服務器上的這個服務就是一個Instance
- Endpoint:Endpoint是指服務中可被外部訪問的具體路徑或接口,端點是服務對外暴露功能的入口點
2、Skywalking 集群安裝
2.1 集群規劃
主機名稱 | 物理IP | 系統 | 資源配置 | 說明 |
---|---|---|---|---|
k8s-master01 | 192.168.200.50 | Rocky9.4 | 4核8g | Master節點 |
k8s-node01 | 192.168.200.51 | Rocky9.4 | 4核8g | Node01節點 |
k8s-node02 | 192.168.200.52 | Rocky9.4 | 4核8g | Node02節點 |
2.2 Skywalking 集群安裝
# 添加 Skywalking Helm 源
[root@k8s-master01 ~]# export REPO=skywalking
[root@k8s-master01 ~]# helm repo add ${REPO} https://apache.jfrog.io/artifactory/skywalking-helm# 下載skywalking
[root@k8s-master01 ~]# helm pull skywalking/skywalking# 解壓安裝包:
[root@k8s-master01 ~]# tar xf skywalking-4.3.0.tgz
[root@k8s-master01 ~]# cd skywalking
[root@k8s-master01 skywalking]# vim values.yaml
[root@k8s-master01 skywalking]# cat values.yaml
# 更改 Elasticsearch 配置:
elasticsearch:antiAffinity: softclusterHealthCheckParams: wait_for_status=green&timeout=10sclusterName: es-clusterconfig:host: elasticsearchpassword: adminport:http: 9200user: adminenabled: trueesMajorVersion: "7"image: crpi-q1nb2n896zwtcdts.cn-beijing.personal.cr.aliyuncs.com/ywb01/elasticsearchimagePullPolicy: IfNotPresentimageTag: 7.5.1persistence:annotations: {}enabled: truereplicas: 3resources:limits:cpu: 2000mmemory: 3Girequests:cpu: 1000mmemory: 2GivolumeClaimTemplate:storageClassName: nfs-csiaccessModes:- ReadWriteOnceresources:requests:storage: 30Gi
initContainer:image: crpi-q1nb2n896zwtcdts.cn-beijing.personal.cr.aliyuncs.com/ywb01/busyboxtag: "1.30"# 更改 OAP 的資源配置:
oap:image:pullPolicy: IfNotPresentrepository: crpi-q1nb2n896zwtcdts.cn-beijing.personal.cr.aliyuncs.com/ywb01/skywalking-oap-servertag: 10.2.0javaOpts: -Xmx2g -Xms2greplicas: 3resources: limits:cpu: 2000mmemory: 3Girequests:cpu: 1000mmemory: 2GistorageType: elasticsearch# 更改 UI 配置:
ui:image:pullPolicy: IfNotPresentrepository: crpi-q1nb2n896zwtcdts.cn-beijing.personal.cr.aliyuncs.com/ywb01/skywalking-uitag: 10.2.0replicas: 3service:annotations: {}externalPort: 80internalPort: 8080type: NodePort
[root@k8s-master01 skywalking]# vim templates/oap-deployment.yaml
[root@k8s-master01 skywalking]# sed -n "91,100p" templates/oap-deployment.yaml livenessProbe:tcpSocket:port: 12800initialDelaySeconds: 300periodSeconds: 20readinessProbe:tcpSocket:port: 12800initialDelaySeconds: 300periodSeconds: 20
# 刪除沖突資源
[root@k8s-master01 skywalking]# rm -rf charts/elasticsearch/templates/pod*# 安裝:
[root@k8s-master01 skywalking]# helm install skywalking -n skywalking . --create-namespace# 查看安裝狀態:
[root@k8s-master01 skywalking]# kubectl get po -n skywalking
NAME READY STATUS RESTARTS AGE
es-cluster-master-0 1/1 Running 0 13m
es-cluster-master-1 1/1 Running 0 13m
es-cluster-master-2 1/1 Running 0 13m
skywalking-es-init-mkvw7 1/1 Running 0 13m
skywalking-oap-6d8f594b7c-7w785 1/1 Running 0 13m
skywalking-oap-6d8f594b7c-p4z64 1/1 Running 0 13m
skywalking-oap-6d8f594b7c-vnp8t 1/1 Running 0 13m
skywalking-ui-774674cc7-qcm79 1/1 Running 0 13m
skywalking-ui-774674cc7-qhgg8 1/1 Running 0 13m
skywalking-ui-774674cc7-qwkjm 1/1 Running 0 13m# 查看service
[root@k8s-master01 skywalking]# kubectl get svc skywalking-ui -n skywalking
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
skywalking-ui NodePort 10.108.110.98 <none> 80:31319/TCP 14m
訪問 skywalking-ui
2.3 Java 服務接入 Skywalking
Java Agent 參考文檔:
Java 語言:
- JAVA_TOOL_OPTIONS:指定 JAVA 的啟動參數,加載 agent 可以通過該變量實現,比如-javaagent:/skywalking/agent/skywalking-agent.jar
- SW_AGENT_NAME:服務名稱,建議格式<組名>::<邏輯名>,推薦配置為命令空
間::服務名稱- SW_AGENT_INSTANCE_NAME:實例名稱,通常用于表示同一個服務不同的示
例,默認為 UUID@hostname,推薦使用 Pod 名稱作為實例名稱- SW_AGENT_COLLECTOR_BACKEND_SERVICES:Skywalking OAP 地址
[root@k8s-master01 skywalking]# mkdir demo/
[root@k8s-master01 skywalking]# cd demo/
[root@k8s-master01 demo]# vim demo-handler-deploy-sw.yaml
[root@k8s-master01 demoskywalking]# cat demo-handler-deploy-sw.yaml
apiVersion: apps/v1
kind: Deployment
metadata:labels:app: demo-handlername: demo-handlernamespace: demo
spec:replicas: 1revisionHistoryLimit: 10selector:matchLabels:app: demo-handlerstrategy:rollingUpdate:maxSurge: 25%maxUnavailable: 25%type: RollingUpdatetemplate:metadata:creationTimestamp: nulllabels:app: demo-handlerspec:volumes: # 添加 Volumes 及初始化容器- name: skywalking-agentemptyDir: {}initContainers:- name: agent-containerimage: crpi-q1nb2n896zwtcdts.cn-beijing.personal.cr.aliyuncs.com/ywb01/skywalking-java-agent:9.4.0-java8volumeMounts:- name: skywalking-agentmountPath: /agentcommand: [ "/bin/sh" ]args: [ "-c", "cp -R /skywalking/agent /agent/ ; mkdir -p /agent/agent/logs/ ; chown -R 1001.1001 /agent" ]containers:- env:- name: SPRING_PROFILES_ACTIVEvalue: k8supgrade- name: SERVER_PORTvalue: "8080"- name: JAVA_TOOL_OPTIONS # 添加環境變量value: "-javaagent:/skywalking/agent/skywalking-agent.jar"- name: NAMESPACEvalueFrom:fieldRef:fieldPath: metadata.namespace- name: APPvalueFrom:fieldRef:fieldPath: metadata.labels['app']- name: SW_AGENT_NAMEvalue: "$(NAMESPACE)::$(APP)"- name: SW_AGENT_INSTANCE_NAMEvalueFrom:fieldRef:fieldPath: metadata.name- name: SW_AGENT_COLLECTOR_BACKEND_SERVICESvalue: skywalking-oap.skywalking:11800image: crpi-q1nb2n896zwtcdts.cn-beijing.personal.cr.aliyuncs.com/ywb01/demo-handler:v1-upgradeimagePullPolicy: IfNotPresentvolumeMounts: # 添加掛載- name: skywalking-agentmountPath: /skywalkinglivenessProbe:failureThreshold: 2initialDelaySeconds: 30periodSeconds: 5successThreshold: 1tcpSocket:port: 8080timeoutSeconds: 2name: demo-handlerreadinessProbe:failureThreshold: 2initialDelaySeconds: 30periodSeconds: 5successThreshold: 1tcpSocket:port: 8080timeoutSeconds: 2resources: {}terminationMessagePath: /dev/termination-logterminationMessagePolicy: FilednsPolicy: ClusterFirstrestartPolicy: AlwaysschedulerName: default-schedulersecurityContext: {}terminationGracePeriodSeconds: 30
# 接下來創建服務并測試:
[root@k8s-master01 demoskywalking]# kubectl create namespace demo
[root@k8s-master01 demoskywalking]# kubectl create -f demo-handler-deploy-sw.yaml -n demo# 檢查pod情況
[root@k8s-master01 demoskywalking]# kubectl get po -n demo -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
demo-handler-5b6f9dd9c7-88pr649d6fd88f-kxhqb 1/1 Running 0 77s 1792.16.58.233 k8s-node028.32.140 k8s-master01 <none> <none># 訪問測試(可以多測試幾次)
[root@k8s-master01 demoskywalking]# curl 1792.16.58.2338.32.140:8080/api/generate
O4E,\1L!u-bzTE[7Fn#VCS+eK?fwcp|k
查看 skywalking 圖表:
拓撲圖
2.4 Go 服務接入 Skywalking
Go Agent 參考文檔:
Go 語言:
- SW_AGENT_REPORTER_GRPC_BACKEND_SERVICE:Skywalking OAP 地址
- SW_AGENT_NAME:服務名稱,建議格式<組名>::<邏輯名>,推薦配置為命令空
間::服務名稱- SW_AGENT_INSTANCE_NAME:實例名稱,通常用于表示同一個服務不同的示例,默認為 UUID@hostname,推薦使用 Pod 名稱作為實例名稱
# 下載測試程序:
[root@habor ~]# git clone https://gitee.com/dukuan/demo-order.git# 編寫dockerfile文件
[root@habor ~]# cd demo-order-master
[root@habor demo-order-master]# vim Dockerfile
[root@habor demo-order-master]# cat Dockerfile
FROM crpi-q1nb2n896zwtcdts.cn-beijing.personal.cr.aliyuncs.com/ywb01/skywalking-go:0.5.0-go1.22 AS builder
COPY ./ /go/src/
WORKDIR /go/src/RUN export GO111MODULE=on && \export GOPROXY=https://goproxy.cn,direct && \skywalking-go-agent -inject /go/src && \go build -o ./order -toolexec="skywalking-go-agent" -a /go/srcFROM crpi-q1nb2n896zwtcdts.cn-beijing.personal.cr.aliyuncs.com/ywb01/alpine:3.20
COPY --from=builder /go/src/order .
CMD [ "./order" ]# 制作鏡像
[root@habor demo-order-master]# docker build -t crpi-q1nb2n896zwtcdts.cn-beijing.personal.cr.aliyuncs.com/ywb01/demo-order:v1 .# 推送鏡像到鏡像倉庫
[root@habor demo-order-master]# docker push crpi-q1nb2n896zwtcdts.cn-beijing.personal.cr.aliyuncs.com/ywb01/demo-order:v1
[root@k8s-master01 demo]# vim mysql.yaml
[root@k8s-master01 demo]# cat mysql.yaml
apiVersion: apps/v1
kind: Deployment
metadata:labels:app: mysqlname: mysqlnamespace: demo
spec:replicas: 1revisionHistoryLimit: 10selector:matchLabels:app: mysqlstrategy:rollingUpdate:maxSurge: 25%maxUnavailable: 25%type: RollingUpdatetemplate:metadata:creationTimestamp: nulllabels:app: mysqlspec:volumes:- name: datapersistentVolumeClaim:claimName: mysql-datacontainers:- env:- name: MYSQL_ROOT_PASSWORDvalue: passwordimage: crpi-q1nb2n896zwtcdts.cn-beijing.personal.cr.aliyuncs.com/ywb01/mysql:8.0.20imagePullPolicy: IfNotPresentname: mysqlresources: {}terminationMessagePath: /dev/termination-logterminationMessagePolicy: FilevolumeMounts:- name: datamountPath: /var/lib/mysqldnsPolicy: ClusterFirstrestartPolicy: AlwaysschedulerName: default-schedulersecurityContext: {}terminationGracePeriodSeconds: 30[root@k8s-master01 demo]# vim mysql-svc.yaml
[root@k8s-master01 demo]# cat mysql-svc.yaml
apiVersion: v1
kind: Service
metadata:labels:app: mysqlname: mysqlnamespace: demo
spec:ports:- nodePort: 32541port: 3306protocol: TCPtargetPort: 3306selector:app: mysqlsessionAffinity: Nonetype: NodePort[root@k8s-master01 demo]# vim mysql-pvc.yaml
[root@k8s-master01 demo]# cat mysql-pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:name: mysql-datanamespace: demo
spec:resources:requests:storage: 5GivolumeMode: FilesystemstorageClassName: nfs-csiaccessModes:- ReadWriteOnce# 創建基礎組件服務:
[root@k8s-master01 demo]# kubectl create -f mysql.yaml -f mysql-svc.yaml -f mysql-pvc.yaml -n demo# 查看pod
[root@k8s-master01 demo]# kubectl get po -n demo
NAME READY STATUS RESTARTS AGE
....
mysql-6d698b4676-8hsn8 1/1 Running 0 3m22s# 配置數據庫:
[root@k8s-master01 demo]# kubectl exec -it mysql-6d698b4676-8hsn8 -n demo -- bash
root@mysql-6d698b4676-8hsn8:/# mysql -uroot -ppassword
....
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.mysql> create database orders;
Query OK, 1 row affected (0.01 sec)mysql> CREATE USER 'order'@'%' IDENTIFIED BY 'password';
Query OK, 0 rows affected (0.01 sec)mysql> GRANT ALL ON orders.* TO 'order'@'%';
Query OK, 0 rows affected (0.02 sec)
# 由于 Go 的代碼在編譯時已經插入探針,所以在啟動時,無法特別指定配置,只需要保留相關的環境變量即可:
[root@k8s-master01 demo]# vim demo-order-deploy.yaml
[root@k8s-master01 demo]# cat demo-order-deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:labels:app: demo-ordername: demo-ordernamespace: demo
spec:replicas: 1revisionHistoryLimit: 10selector:matchLabels:app: demo-orderstrategy:rollingUpdate:maxSurge: 25%maxUnavailable: 25%type: RollingUpdatetemplate:metadata:creationTimestamp: nulllabels:app: demo-orderspec:containers:- env:- name: MYSQL_HOSTvalue: mysql- name: MYSQL_PORTvalue: "3306"- name: MYSQL_USERvalue: order- name: MYSQL_PASSWORDvalue: password- name: MYSQL_DBvalue: orders# 添加變量- name: NAMESPACEvalueFrom:fieldRef:fieldPath: metadata.namespace- name: APPvalueFrom:fieldRef:fieldPath: metadata.labels['app']- name: SW_AGENT_NAMEvalue: "$(NAMESPACE)::$(APP)"#- name: SW_AGENT_NAME# value: demo::demo-order- name: SW_AGENT_INSTANCE_NAMEvalueFrom:fieldRef:fieldPath: metadata.name- name: SW_AGENT_REPORTER_GRPC_BACKEND_SERVICEvalue: skywalking-oap.skywalking:11800image: crpi-q1nb2n896zwtcdts.cn-beijing.personal.cr.aliyuncs.com/ywb01/demo-order:v2imagePullPolicy: AlwayslivenessProbe:failureThreshold: 2initialDelaySeconds: 30periodSeconds: 5successThreshold: 1tcpSocket:port: 8080timeoutSeconds: 2name: demo-orderreadinessProbe:failureThreshold: 2initialDelaySeconds: 30periodSeconds: 5successThreshold: 1tcpSocket:port: 8080timeoutSeconds: 2resources: {}terminationMessagePath: /dev/termination-logterminationMessagePolicy: FilednsPolicy: ClusterFirstrestartPolicy: AlwaysschedulerName: default-schedulersecurityContext: {}terminationGracePeriodSeconds: 30
# 接下來創建服務并測試:
[root@k8s-master01 demo]# kubectl create -f demo-order-deploy.yaml -n demo# 檢查pod情況
[root@k8s-master01 demo]# kubectl get po -n demo -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
demo-order-755cdc96-ltlzg 1/1 Running 0 65s 172.16.58.239 k8s-node02 <none> <none># 訪問測試(可以多測試幾次)
[root@k8s-master01 demo]# curl 172.16.58.239:8080/orders
[{"id":1,"name":"Order 1","price":10},{"id":2,"name":"Order 2","price":20}]
查看 skywalking 圖表:
自動檢測數據庫
2.5 清理環境
[root@k8s-master01 demo]# kubectl delete deploy -n demo --all
3、全鏈路追蹤項目練習
通過上述的學習,Skywalking 已經成功接入 Go 和 Java 的鏈路數據,接下來通過一個完整的項目,繼續鞏固 Skywalking 的學習。
項目架構:
3.1 服務部署
3.1.1 部署數據庫(延用上個實驗配置)
# 部署數據庫
[root@k8s-master01 demo]# kubectl create -f mysql.yaml -f mysql-svc.yaml -f
[root@k8s-master01 demo]# kubectl get po -n demo
NAME READY STATUS RESTARTS AGE
mysql-6d698b4676-sk8hj 1/1 Running 0 17s# 創建賬號
[root@k8s-master01 demo]# kubectl exec -it mysql-6d698b4676-sk8hj -n demo -- bash
root@mysql-6d698b4676-sk8hj:/# mysql -uroot -ppassword
....
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.mysql> create database orders;
Query OK, 1 row affected (0.04 sec)mysql> CREATE USER 'order'@'%' IDENTIFIED BY 'password';
Query OK, 0 rows affected (0.02 sec)mysql> GRANT ALL ON orders.* TO 'order'@'%';
Query OK, 0 rows affected (0.01 sec)
3.1.2 啟動 order 服務
# 啟動 order 服務,order 服務為 Go 程序,無需更改額外的配置即可完成監控數據的推送:
# 延用上個實驗配置,創建一個service
[root@k8s-master01 demo]# vim demo-order-svc.yaml
[root@k8s-master01 demo]# cat demo-order-svc.yaml
apiVersion: v1
kind: Service
metadata:labels:app: ordername: ordernamespace: demo
spec:ports:- name: http-webport: 80protocol: TCPtargetPort: 8080selector:app: demo-ordersessionAffinity: Nonetype: ClusterIP# 配置一個對外的域名
[root@k8s-master01 demo]# vim demo-order-ingress.yaml
[root@k8s-master01 demo]# cat demo-order-ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:name: demo-ordernamespace: demo
spec:ingressClassName: nginxrules:- host: demo.test.comhttp:paths:- backend:service:name: orderport:number: 80path: /orderspathType: ImplementationSpecific# 創建服務
[root@k8s-master01 demo]# kubectl create -f demo-order-deploy.yaml -f demo-order-svc.yaml -f demo-order-ingress.yaml -n demo# 查看服務狀態:
[root@k8s-master01 demo]# kubectl get pod -n demo -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
demo-order-755cdc96-8qlc9 1/1 Running 0 2m54s 172.16.58.245 k8s-node02 <none> <none>
mysql-6d698b4676-sk8hj 1/1 Running 0 111m 172.16.58.241 k8s-node02 <none> <none>[root@k8s-master01 demo]# kubectl get svc,ingress -n demo
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/mysql NodePort 10.111.54.12 <none> 3306:32541/TCP 111m
service/order ClusterIP 10.101.166.166 <none> 80/TCP 3m1sNAME CLASS HOSTS ADDRESS PORTS AGE
ingress.networking.k8s.io/demo-order nginx demo.test.com 192.168.200.52 80 3m1s# 測試訪問:
[root@k8s-master01 demo]# echo "192.168.200.52 demo.test.com" >> /etc/hosts
[root@k8s-master01 demo]# curl demo.test.com/orders
[{"id":1,"name":"Order 1","price":10},{"id":2,"name":"Order 2","price":20},{"id":3,"name":"Order 1","price":10},{"id":4,"name":"Order 2","price":20}]
3.1.3 部署 handler 服務(延用上個實驗配置)
# 部署 handler 服務
[root@k8s-master01 demo]# kubectl create -f demo-handler-deploy-sw.yaml -f demo-handler-svc.yaml -n demo
3.1.4 部署 receive 服務
[root@k8s-master01 demo]# vim demo-receive-deploy.yaml
[root@k8s-master01 demo]# cat demo-receive-deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:labels:app: demo-receivename: demo-receivenamespace: demo
spec:progressDeadlineSeconds: 600replicas: 1revisionHistoryLimit: 10selector:matchLabels:app: demo-receivestrategy:rollingUpdate:maxSurge: 25%maxUnavailable: 25%type: RollingUpdatetemplate:metadata:creationTimestamp: nulllabels:app: demo-receivespec:volumes:- name: skywalking-agentemptyDir: {}initContainers:- name: agent-containerimage: crpi-q1nb2n896zwtcdts.cn-beijing.personal.cr.aliyuncs.com/ywb01/skywalking-java-agent:9.4.0-java8volumeMounts:- name: skywalking-agentmountPath: /agentcommand: [ "/bin/sh" ]args: [ "-c", "cp -R /skywalking/agent /agent/ ; mkdir -p /agent/agent/logs/ ; chown -R 1001.1001 /agent" ]containers:- env:- name: SPRING_PROFILES_ACTIVEvalue: k8supgrade- name: SERVER_PORTvalue: "8080"- name: JAVA_TOOL_OPTIONSvalue: "-javaagent:/skywalking/agent/skywalking-agent.jar"- name: NAMESPACEvalueFrom:fieldRef:fieldPath: metadata.namespace- name: APPvalueFrom:fieldRef:fieldPath: metadata.labels['app']- name: SW_AGENT_NAMEvalue: "$(NAMESPACE)::$(APP)"- name: SW_AGENT_INSTANCE_NAMEvalueFrom:fieldRef:fieldPath: metadata.name- name: SW_AGENT_COLLECTOR_BACKEND_SERVICESvalue: skywalking-oap.skywalking:11800volumeMounts:- name: skywalking-agentmountPath: /skywalkingimage: crpi-q1nb2n896zwtcdts.cn-beijing.personal.cr.aliyuncs.com/ywb01/demo-receive:v1-upgradeimagePullPolicy: AlwayslivenessProbe:failureThreshold: 2initialDelaySeconds: 30periodSeconds: 5successThreshold: 1tcpSocket:port: 8080timeoutSeconds: 2name: demo-receivereadinessProbe:failureThreshold: 2initialDelaySeconds: 30periodSeconds: 5successThreshold: 1tcpSocket:port: 8080timeoutSeconds: 2resources: {}terminationMessagePath: /dev/termination-logterminationMessagePolicy: FilednsPolicy: ClusterFirstrestartPolicy: AlwaysschedulerName: default-schedulersecurityContext: {}terminationGracePeriodSeconds: 30[root@k8s-master01 demo]# vim demo-receive-svc.yaml
[root@k8s-master01 demo]# cat demo-receive-svc.yaml
apiVersion: v1
kind: Service
metadata:labels:app: demo-receivename: demo-receivenamespace: demo
spec:ports:- name: http-webport: 8080protocol: TCPtargetPort: 8080selector:app: demo-receivesessionAffinity: Nonetype: ClusterIP[root@k8s-master01 demo]# vim demo-receive-ingress.yaml
[root@k8s-master01 demo]# cat demo-receive-ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:annotations:nginx.ingress.kubernetes.io/rewrite-target: /$2name: demo-receivenamespace: demo
spec:ingressClassName: nginxrules:- host: demo.test.comhttp:paths:- backend:service:name: demo-receiveport:number: 8080path: /receiveapi(/|$)(.*)pathType: ImplementationSpecific# 部署 receive 服務:
[root@k8s-master01 demo]# kubectl create -f demo-receive-deploy.yaml -f demo-receive-svc.yaml -f demo-receive-ingress.yaml -n demo
3.1.5 部署前端服務
[root@k8s-master01 demo]# vim demo-ui-deploy.yaml
[root@k8s-master01 demo]# cat demo-ui-deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:labels:app: demo-uiname: demo-uinamespace: demo
spec:replicas: 1revisionHistoryLimit: 10selector:matchLabels:app: demo-uistrategy:rollingUpdate:maxSurge: 25%maxUnavailable: 25%type: RollingUpdatetemplate:metadata:creationTimestamp: nulllabels:app: demo-uispec:containers:- image: crpi-q1nb2n896zwtcdts.cn-beijing.personal.cr.aliyuncs.com/ywb01/demo-ui:swimagePullPolicy: AlwayslivenessProbe:failureThreshold: 2initialDelaySeconds: 10periodSeconds: 5successThreshold: 1tcpSocket:port: 80timeoutSeconds: 2name: demo-uireadinessProbe:failureThreshold: 2initialDelaySeconds: 10periodSeconds: 5successThreshold: 1tcpSocket:port: 80timeoutSeconds: 2resources: {}terminationMessagePath: /dev/termination-logterminationMessagePolicy: FilednsPolicy: ClusterFirstrestartPolicy: AlwaysschedulerName: default-schedulersecurityContext: {}terminationGracePeriodSeconds: 30[root@k8s-master01 demo]# vim demo-ui-svc.yaml
[root@k8s-master01 demo]# cat demo-ui-svc.yaml
apiVersion: v1
kind: Service
metadata:labels:app: demo-uiname: demo-uinamespace: demo
spec:ports:- name: http-webport: 80protocol: TCPtargetPort: 80selector:app: demo-uisessionAffinity: Nonetype: ClusterIP[root@k8s-master01 demo]# vim demo-ui-ingress.yaml
[root@k8s-master01 demo]# cat demo-ui-ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:name: demo-uinamespace: demo
spec:ingressClassName: nginxrules:- host: demo.test.comhttp:paths:- backend:service:name: demo-uiport:number: 80path: /pathType: ImplementationSpecific# 部署前端服務:
[root@k8s-master01 demo]# kubectl create -f demo-ui-deploy.yaml -f demo-ui-svc.yaml -f demo-ui-ingress.yaml -nn demo# 部署完畢后,最終的服務如下:
[root@k8s-master01 demo]# kubectl get po,svc,ingress -n demo
NAME READY STATUS RESTARTS AGE
pod/demo-handler-5b6f9dd9c7-g4k5s 1/1 Running 1 (25m ago) 26m
pod/demo-order-755cdc96-8qlc9 1/1 Running 0 47m
pod/demo-receive-5cf555cdfd-j5g76 1/1 Running 1 (14m ago) 16m
pod/demo-ui-66bb5f4d67-smbpb 1/1 Running 0 83s
pod/mysql-6d698b4676-sk8hj 1/1 Running 0 155mNAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/demo-receive ClusterIP 10.103.251.213 <none> 8080/TCP 16m
service/demo-ui ClusterIP 10.106.49.125 <none> 80/TCP 83s
service/handler ClusterIP 10.102.43.148 <none> 80/TCP 26m
service/mysql NodePort 10.111.54.12 <none> 3306:32541/TCP 155m
service/order ClusterIP 10.101.166.166 <none> 80/TCP 47mNAME CLASS HOSTS ADDRESS PORTS AGE
ingress.networking.k8s.io/demo-order nginx demo.test.com 192.168.200.52 80 47m
ingress.networking.k8s.io/demo-receive nginx demo.test.com 192.168.200.52 80 16m
ingress.networking.k8s.io/demo-ui nginx demo.test.com 192.168.200.52 80 83s
接下來通過瀏覽器訪問:
3.2 服務訪問與監控
接下來訪問頁面,測試生成密碼和創建訂單:
之后就可以看到整個項目的架構圖:
創建訂單會有隨機延遲,延遲信息也可以在 skywalking 上面看到 trace 信息:
3.3 模擬故障
# 接下來模擬 handler 服務故障:
[root@k8s-master01 demo]# kubectl scale deploy demo-handler mysql --replicas=0 -n demo
再次訪問即可收集到錯誤的鏈路信息:
4、Skywalking 告警
4.1 Skywalking 告警通知
Skywalking支持針對采集的Metrics數據進行監控告警,并可以在出現異常時及時作出反應。通過合理配置告警規則和鉤子,可以實現有效地預防潛在問題并及時定位相關問題。
Skywalking的告警核心由一組規則實現,主要包含如下三個部分:
- 指標(Metrics):Skywalking收集的關于服務、實例和端點的各種性能指標數據
- 規則(Rules):告警的觸發規則,默認定義在
config/alarm-settings.yaml
文件中,支持比較運算符和邏輯運算符等- 鉤子(Hooks):當告警被觸發后,通過鉤子來執行特定的操作,如發送通知等
4.2 Skywalking 告警規則
Skywalking 告警規則由如下元素組成:
- 規則名稱:全局唯一,必須由
_rule
結尾- expression:使用MOE(Metrics Query Expression)定義,表達式的結果必須是
SINGLE_VALUE
,且根操作必須是一個比較操作或布爾操作,同時結果需要為1(true)或0(false),當結果為1(true)時,告警會被觸發- include-name:包含的實體名稱,可以是Service、Instance、Endpoint等,列表類型
- exclude-names:排除的實體名稱
- include-names-regex:正則匹配包含
- exclude-names-regex:正則匹配排除
- tags:附加告警標簽,比如
level=warning
- period:周期,檢查告警條件的時間窗口大小,以分鐘為單位
- silence-period:靜默期,某個告警被觸發后,在接下來的一段時間內,該告警不會再次被觸發,不指定該值則和
period
一樣- hooks:告警觸發時綁定的鉤子名稱,名稱格式為
{hookType}.{hookName}
(例如slack.customl
),并且必須在alarm-settings.yml
文件的hooks
部分定義。如果未指定鉤子名稱,則會使用全局鉤子- message:告警信息,可以用作描述當前告警
4.3 釘釘告警機器人配置
使用釘釘告警,需要先創建一個群聊,然后添加一個機器人:
添加機器人
選擇自定義
填寫機器人名稱,以及復制密匙
添加機器人以及復制Webhook
4.4 Skywalking 接入釘釘告警
首先把 Skywalking 告警的配置文件放置在 Skywalking 的安裝目錄:
# 創建告警存放目錄
[root@k8s-master01 demo]# mkdir -p ../files/conf.d/oap
[root@k8s-master01 demo]# cd ../files/conf.d/oap# 從oap容器里把告警模板文件copy出來
[root@k8s-master01 oap]# kubectl cp skywalking-oap-6d8f594b7c-xrnbr:/skywalking/config/alarm-settings.yml ./alarm-settings.yml -n skywalking# 添加釘釘告警
[root@k8s-master01 oap]# vim alarm-settings.yml
[root@k8s-master01 oap]# tail -14 alarm-settings.yml
hooks:dingtalk:default:is-default: truetext-template: |-{"msgtype": "text","text": {"content": "Apache SkyWalking Alarm: \n %s."} }webhooks:- url: https://oapi.dingtalk.com/robot/send?access_token=c7cd207fd31cd72f433d67effda0568b681b10f626f97c02cb55f03b73b651c5secret: SECedef18728aa48ea6ca4c2f595967f6c389e2fc4d13bfca2741087b8c8878e017# 更新配置(需要回到skywalking根目錄)
[root@k8s-master01 oap]# cd ../../..
[root@k8s-master01 skywalking]# helm upgrade skywalking . -n skywalking# 查看 Pod 更新狀態:
[root@k8s-master01 skywalking]# kubectl get po -n skywalking | grep oap
skywalking-oap-5644bbbd46-hvvxx 1/1 Running 0 11m# 查看配置文件是否更新:
[root@k8s-master01 skywalking]# kubectl exec skywalking-oap-5644bbbd46-hvvxx -n skywalking -- tail -14 config/alarm-settings.yml
Defaulted container "oap" out of: oap, wait-for-elasticsearch (init)
hooks:dingtalk:default:is-default: truetext-template: |-{"msgtype": "text","text": {"content": "Apache SkyWalking Alarm: \n %s."} }webhooks:- url: https://oapi.dingtalk.com/robot/send?access_token=c7cd207fd31cd72f433d67effda0568b681b10f626f97c02cb55f03b73b651c5secret: SECedef18728aa48ea6ca4c2f595967f6c389e2fc4d13bfca2741087b8c8878e017
請求服務,觸發告警:
等待一會釘釘即可查詢到告警信息
4.5 自定義告警規則
除了默認告警,還可以添加一些自定義告警,比如想要監控 Java 服務 JVM 線程池是否阻塞,可以通過 instance_jvm_thread_blocked_state_thread_count
指標進行監控。
# 比如監控 JVM 阻塞的線程數大于 5:
[root@k8s-master01 oap]# vim alarm-settings.yml
[root@k8s-master01 oap]# cat alarm-settings.yml
....
rules:thread_block_rule:expression: sum(instance_jvm_thread_blocked_state_thread_count >5) >= 2period: 5 # 檢查過去 5 分鐘的數據message: "服務 {name} 的線程池,在過去兩分鐘內被阻塞的數量超過 5"
....# 更改配置文件后,更新配置:
[root@k8s-master01 skywalking]# helm upgrade skywalking -n skywalking .
[root@k8s-master01 skywalking]# kubectl rollout restart deploy skywalking-oap -n skywalking
此博客來源于:https://edu.51cto.com/lecturer/11062970.html