kubernetes|云原生|Deployment does not have minimum availability 的解決方案（資源隱藏的由來）

前言：

最近在部署prometheus的過程中遇到的這個問題，感覺比較的經典，有必要記錄一下。

現象是部署prometheus主服務的時候，看不到pod，只能看到deployment，由于慌亂，一度以為是集群有毛病了，然后重新做了集群，具體情況如下圖：

注：up-to-date表示沒有部署，available表示無可用pod

[root@node4 yaml]# k get deployments.apps -n monitor-sa 
NAME                READY   UP-TO-DATE   AVAILABLE   AGE
prometheus-server   0/2     0            0           2m5s
[root@node4 yaml]# k get po -n monitor-sa 
NAME                                 READY   STATUS        RESTARTS   AGE
node-exporter-6ttbl                  1/1     Running       0          23h
node-exporter-7ls5t                  1/1     Running       0          23h
node-exporter-r287q                  1/1     Running       0          23h
node-exporter-z85dm                  1/1     Running       0          23h

部署文件如下；

注意注意，有一個sa的引用哦??serviceAccountName: monitor

[root@node4 yaml]# cat prometheus-deploy.yaml 
---
apiVersion: apps/v1
kind: Deployment
metadata:name: prometheus-servernamespace: monitor-salabels:app: prometheus
spec:replicas: 2selector:matchLabels:app: prometheuscomponent: server#matchExpressions:#- {key: app, operator: In, values: [prometheus]}#- {key: component, operator: In, values: [server]}template:metadata:labels:app: prometheuscomponent: serverannotations:prometheus.io/scrape: 'false'spec:nodeName: node4serviceAccountName: monitorcontainers:- name: prometheusimage: prom/prometheus:v2.2.1imagePullPolicy: IfNotPresentcommand:- prometheus- --config.file=/etc/prometheus/prometheus.yml- --storage.tsdb.path=/prometheus- --storage.tsdb.retention=720hports:- containerPort: 9090protocol: TCPvolumeMounts:- mountPath: /etc/prometheus/prometheus.ymlname: prometheus-configsubPath: prometheus.yml- mountPath: /prometheus/name: prometheus-storage-volumevolumes:- name: prometheus-configconfigMap:name: prometheus-configitems:- key: prometheus.ymlpath: prometheus.ymlmode: 0644- name: prometheus-storage-volumehostPath:path: /datatype: Directory

解決方案：

那么，遇到這種情況，我們應該怎么做呢？當然了，第一點就是不要慌，其次deployment控制器有一個比較不讓人注意的地方，就是編輯deployment可以看到該deployment的當前狀態詳情，會有非常詳細的信息給我們看，也就是status字段

具體的命令是?kubectl?edit?deployment -n?命名空間? deployment名稱，在本例中是這樣的：

。。。。。。略略略   path: prometheus.ymlname: prometheus-configname: prometheus-config- hostPath:path: /datatype: Directoryname: prometheus-storage-volume
status:conditions:- lastTransitionTime: "2023-11-22T15:21:06Z"lastUpdateTime: "2023-11-22T15:21:06Z"message: Deployment does not have minimum availability.reason: MinimumReplicasUnavailablestatus: "False"type: Available- lastTransitionTime: "2023-11-22T15:21:06Z"lastUpdateTime: "2023-11-22T15:21:06Z"message: 'pods "prometheus-server-78bbb77dd7-" is forbidden: error looking upservice account monitor-sa/monitor: serviceaccount "monitor" not found'reason: FailedCreatestatus: "True"type: ReplicaFailure- lastTransitionTime: "2023-11-22T15:31:07Z"lastUpdateTime: "2023-11-22T15:31:07Z"message: ReplicaSet "prometheus-server-78bbb77dd7" has timed out progressing.reason: ProgressDeadlineExceededstatus: "False"type: ProgressingobservedGeneration: 1unavailableReplicas: 2

可以看到有三個message，第一個是標題里提到的報錯信息，在dashboard里這個信息會優先顯示，如果是報錯的時候，第二個message是進一步解釋錯誤問題在哪，本例里是說有個名叫?monitor的sa沒有找到，第三個信息說的是這個deployment控制的rs部署失敗，此信息無關緊要了，那么，重要的是第二個信息，這個信息是解決問題的關鍵。

附：一個正常的deployment?的status：

這個status告訴我們，他是一個副本，部署成功的，因此，第一個message是Deployment has minimum availability

      serviceAccount: kube-state-metricsserviceAccountName: kube-state-metricsterminationGracePeriodSeconds: 30
status:availableReplicas: 1conditions:- lastTransitionTime: "2023-11-21T14:56:14Z"lastUpdateTime: "2023-11-21T14:56:14Z"message: Deployment has minimum availability.reason: MinimumReplicasAvailablestatus: "True"type: Available- lastTransitionTime: "2023-11-21T14:56:13Z"lastUpdateTime: "2023-11-21T14:56:14Z"message: ReplicaSet "kube-state-metrics-57794dcf65" has successfully progressed.reason: NewReplicaSetAvailablestatus: "True"type: ProgressingobservedGeneration: 1readyReplicas: 1replicas: 1updatedReplicas: 1

具體的解決方案：

根據以上報錯信息，那么，我們就需要一個sa，當然了，如果不想給太高的權限，就需要自己編寫權限文件了，這里我偷懶?使用cluster-admin，具體的命令如下：

[root@node4 yaml]# k create sa -n monitor-sa monitor
serviceaccount/monitor created
[root@node4 yaml]# k create clusterrolebinding monitor-clusterrolebinding -n monitor-sa --clusterrole=cluster-admin  --serviceaccount=monitor-sa:monitor

再次部署就成功了：

[root@node4 yaml]# k get po -n monitor-sa  -owide
NAME                                 READY   STATUS      RESTARTS        AGE   IP               NODE    NOMINATED NODE   READINESS GATES
node-exporter-6ttbl                  1/1     Running     0               24h   192.168.123.12   node2   <none>           <none>
node-exporter-7ls5t                  1/1     Running     0               24h   192.168.123.11   node1   <none>           <none>
node-exporter-r287q                  1/1     Running     1 (2m57s ago)   24h   192.168.123.14   node4   <none>           <none>
node-exporter-z85dm                  1/1     Running     0               24h   192.168.123.13   node3   <none>           <none>
prometheus-server-78bbb77dd7-6smlt   1/1     Running     0               20s   10.244.41.19     node4   <none>           <none>
prometheus-server-78bbb77dd7-fhf5k   1/1     Running     0               20s   10.244.41.18     node4   <none>           <none>

總結來了：

那么，其實缺少sa可能會導致pod被隱藏，可以得出，sa是這個deployment的必要非顯性依賴，同樣的，如果部署文件內有寫configmap，但configmap并沒有提前創建也會出現這種錯誤，就是創建了deployment，但pod創建不出來，不像namespace沒有提前創建的情況，namespace是必要顯性依賴，沒有會直接不讓創建。

配額設置也是和sa一樣的必要非顯性依賴。

例如，下面創建一個針對default這個命名空間的配額文件，此文件定義如下：

定義的內容為規定default命名空間下最多4個pods，最多20個services，只能使用10G的內存，5.5的CPU

[root@node4 yaml]# cat quota-nginx.yaml 
apiVersion: v1
kind: ResourceQuota
metadata:name: quotanamespace: default
spec:hard:requests.cpu: "5.5"limits.cpu: "5.5"requests.memory: 10Gilimits.memory: 10Gipods: "4"services: "20"

下面創建一個deployment，副本是6個的nginx：

[root@node4 yaml]# cat nginx.yaml 
apiVersion: apps/v1
kind: Deployment
metadata:annotations:deployment.kubernetes.io/revision: "1"creationTimestamp: "2023-11-22T16:13:33Z"generation: 1labels:app: nginxname: nginxnamespace: defaultresourceVersion: "16411"uid: e9a5cdc5-c6f0-45fb-a001-fcdd695eb925
spec:progressDeadlineSeconds: 600replicas: 6revisionHistoryLimit: 10selector:matchLabels:app: nginxstrategy:rollingUpdate:maxSurge: 25%maxUnavailable: 25%type: RollingUpdatetemplate:metadata:creationTimestamp: nulllabels:app: nginxspec:containers:- image: nginx:1.18imagePullPolicy: IfNotPresentname: nginxresources: {}terminationMessagePath: /dev/termination-logterminationMessagePolicy: Fileresources:limits:cpu: 1memory: 1Girequests:cpu: 500mmemory: 512MidnsPolicy: ClusterFirstrestartPolicy: AlwaysschedulerName: default-schedulersecurityContext: {}terminationGracePeriodSeconds: 30

創建完畢后，發現只有四個pod，配額有效：

[root@node4 yaml]# k get po
NAME                     READY   STATUS    RESTARTS   AGE
nginx-54f9858f64-g65pk   1/1     Running   0          4m50s
nginx-54f9858f64-h42vf   1/1     Running   0          4m50s
nginx-54f9858f64-s776t   1/1     Running   0          4m50s
nginx-54f9858f64-wl7wz   1/1     Running   0          4m50s

那么，還有兩個pod呢？

[root@node4 yaml]# k get deployments.apps nginx -oyaml |grep messagemessage: Deployment does not have minimum availability.message: 'pods "nginx-54f9858f64-p8rxf" is forbidden: exceeded quota: quota, requested:message: ReplicaSet "nginx-54f9858f64" is progressing.

那么解決的方法也很簡單，也就是調整quota啦，怎么調整就不在這里廢話了吧！！！！！！！！！~~~~~~

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/news/161623.shtml
繁體地址，請注明出處：http://hk.pswp.cn/news/161623.shtml
英文地址，請注明出處：http://en.pswp.cn/news/161623.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！