自建prometheus監控騰訊云k8s集群
使用場景
k8s集群(騰訊云容器服務)
promtheus (外部自建服務)
騰訊云提供了容器內部自建 Prometheus 監控 TKE 集群的文檔,參考。
當前的環境promethues建在k8S外的云服務器上,與上面鏈接文檔略有差異,以下給出集群外自建prometheus監控騰訊云k8s集群正確的步驟。
配置步驟
創建serviceAccount
kubectl create sa prometheus-sa
創建ClusterRole
vi ClusterRole.yml
kind: ClusterRole
metadata:name: prometheus-kubelet-ro
rules:
- apiGroups: [""]resources: ["nodes"]verbs: ["get", "list", "watch"]
- apiGroups: [""]resources: ["nodes/metrics"]verbs: ["get"]
- apiGroups: [""]resources: ["pods"]verbs: ["get", "list", "watch"]
kubectl apply -f ClusterRole.yml
創建clusterrolebinding
kubectl create clusterrolebinding prometheus-sa-binding --clusterrole=prometheus-kubelet-ro --serviceaccount=default:prometheus-sa
驗證權限
kubectl auth can-i get nodes/metrics --as=system:serviceaccount:default:prometheus-sa
kubectl auth can-i get nodes --as=system:serviceaccount:default:prometheus-sa
生成token
#替換成正確目錄
kubectl -n default get secret prometheus-sa-token -o jsonpath='{.data.token}' | base64 -d > $prometheus_dir/secret/kube-token
prometheus配置
- job_name: 'tke-cadvisor'scrape_interval: 15sscrape_timeout: 10smetrics_path: /metrics/cadvisorscheme: httpskubernetes_sd_configs:- role: nodeapi_server: "https://<apiserver>:<port>"##針對sd_服務的tls配置bearer_token_file: /etc/prometheus/secrets/kube-token#針對sd_服務的tls配置tls_config:insecure_skip_verify: true# scrape的token配置bearer_token_file: /etc/prometheus/secrets/kube-token # scrape的tls配置tls_config:insecure_skip_verify: truerelabel_configs:- source_labels: [__meta_kubernetes_node_label_node_kubernetes_io_instance_type]regex: ekletaction: drop- source_labels: [__meta_kubernetes_node_address_InternalIP]target_label: __address__replacement: "${1}:10250"- action: labelmapregex: __meta_kubernetes_node_label_(.+)- job_name: 'tke-node'scrape_interval: 15sscrape_timeout: 10smetrics_path: /metricsscheme: httpkubernetes_sd_configs:- role: nodeapi_server: "https://<apiserver>:<port>"bearer_token_file: /etc/prometheus/secrets/kube-tokentls_config:insecure_skip_verify: truebearer_token_file: /etc/prometheus/secrets/kube-tokenrelabel_configs:- source_labels: [__meta_kubernetes_node_label_node_kubernetes_io_instance_type]regex: ekletaction: drop- source_labels: [__meta_kubernetes_node_address_InternalIP]target_label: __address__replacement: "${1}:9100"- action: labelmapregex: __meta_kubernetes_node_label_(.+)
[!NOTE]
1.TKE 節點上的 kubelet 證書是自簽的,需要忽略證書校驗,所以
insecure_skip_verify
要置為 true。2.
kubernetes_sd_configs:
和job級別配置
都需要添加bearer_token_file
和insecure_skip_verify
kubernetes_sd_configs不添加會導致sd不能正常發現節點 kubernetes,job配置不添加會導致prometheus抓取/metrics/cadvisor返回401未授權錯誤