在《研發工程師玩轉Kubernetes——啟動、存活和就緒探針》中,我們講了就緒探針和服務之間的特殊關系。就緒探針檢測失敗并不代表整個程序處于“非存活”狀態,可能只是短暫臨時的不可以提供服務,比如CPU階段性占滿,導致就緒探針檢測超時而導致失敗。這個時候就緒探針并不會向存活探針那樣嘗試重啟容器,而只是簡單的把它從何它關聯的Service中摘除。
帶Readiness Probe的Nginx
apiVersion: apps/v1
kind: Deployment
metadata:name: readiness-nginx-deployment
spec:selector:matchLabels:app: readiness-nginxreplicas: 2template:metadata:labels:app: readiness-nginxspec:containers:- name: readiness-nginx-containerimage: nginxports:- containerPort: 80command: ["/bin/sh", "-c", "sleep 3; touch /tempdir/readiness-nginx; while true; do sleep 5; done"]volumeMounts:- name: probe-volumemountPath: /tempdirreadinessProbe:exec:command:- cat- /tempdir/readiness-nginxinitialDelaySeconds: 2failureThreshold: 6periodSeconds: 1successThreshold: 1volumes:- name: probe-volumeemptyDir: medium: MemorysizeLimit: 1Gi
Nginx關聯的Service
kind: Service
apiVersion: v1
metadata:name: readiness-nginx-service
spec:selector:app: readiness-nginxports:- protocol: TCPport: 80targetPort: 80
實驗
創建上述組件,可以看到啟動了下面的Pod
kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
readiness-nginx-deployment-57b7fd5644-7x7wc 1/1 Running 0 25s 10.1.43.223 ubuntuc <none> <none>
readiness-nginx-deployment-57b7fd5644-lhszp 1/1 Running 0 25s 10.1.209.155 ubuntub <none> <none>
Service也綁定了這些IP。
kubectl describe endpoints readiness-nginx-service
Name: readiness-nginx-service
Namespace: default
Labels: <none>
Annotations: endpoints.kubernetes.io/last-change-trigger-time: 2023-08-14T14:35:33Z
Subsets:Addresses: 10.1.209.155,10.1.43.223NotReadyAddresses: <none>Ports:Name Port Protocol---- ---- --------<unset> 80 TCPEvents: <none>
現在我們挑選一個容器(readiness-nginx-deployment-57b7fd5644-7x7wc,10.1.43.223),觀察該容器的Event狀態:
kubectl describe pod readiness-nginx-deployment-57b7fd5644-7x7wc
Name: readiness-nginx-deployment-57b7fd5644-7x7wc
Namespace: default
Priority: 0
Service Account: default
Node: ubuntuc/172.22.247.176
Start Time: Mon, 14 Aug 2023 14:35:27 +0000
Labels: app=readiness-nginxpod-template-hash=57b7fd5644
Annotations: cni.projectcalico.org/containerID: c475d3e82ff0d5adbd35252ab990608ad75955f8d0862bb8b0c54ee60a0878ebcni.projectcalico.org/podIP: 10.1.43.223/32cni.projectcalico.org/podIPs: 10.1.43.223/32
Status: Running
IP: 10.1.43.223
IPs:IP: 10.1.43.223
Controlled By: ReplicaSet/readiness-nginx-deployment-57b7fd5644
Containers:readiness-nginx-container:Container ID: containerd://5d82d8467bc6e0c8151e40ee3258d54bffec8659bcdad4a441848ea8f77a3223Image: nginxImage ID: docker.io/library/nginx@sha256:67f9a4f10d147a6e04629340e6493c9703300ca23a2f7f3aa56fe615d75d31caPort: 80/TCPHost Port: 0/TCPCommand:/bin/sh-csleep 3; touch /tempdir/readiness-nginx; while true; do sleep 5; doneState: RunningStarted: Mon, 14 Aug 2023 14:35:30 +0000Ready: TrueRestart Count: 0Readiness: exec [cat /tempdir/readiness-nginx] delay=2s timeout=1s period=1s #success=1 #failure=6Environment: <none>Mounts:/tempdir from probe-volume (rw)/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-c4tcl (ro)
Conditions:Type StatusInitialized True Ready True ContainersReady True PodScheduled True
Volumes:probe-volume:Type: EmptyDir (a temporary directory that shares a pod's lifetime)Medium: MemorySizeLimit: 1Gikube-api-access-c4tcl:Type: Projected (a volume that contains injected data from multiple sources)TokenExpirationSeconds: 3607ConfigMapName: kube-root-ca.crtConfigMapOptional: <nil>DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300snode.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:Type Reason Age From Message---- ------ ---- ---- -------Normal Scheduled 3m53s default-scheduler Successfully assigned default/readiness-nginx-deployment-57b7fd5644-7x7wc to ubuntucNormal Pulling 3m53s kubelet Pulling image "nginx"Normal Pulled 3m50s kubelet Successfully pulled image "nginx" in 2.489885583s (2.489893984s including waiting)Normal Created 3m50s kubelet Created container readiness-nginx-containerNormal Started 3m50s kubelet Started container readiness-nginx-containerWarning Unhealthy 3m48s (x2 over 3m48s) kubelet Readiness probe failed: cat: /tempdir/readiness-nginx: No such file or directory
可以看到就緒探針在第3次檢測時就存在了,這個時候Pod的Ready和ContainersReady都是True的狀態。
就緒->非就緒
現在我們刪除就緒標志文件
kubectl exec pods/readiness-nginx-deployment-57b7fd5644-7x7wc --container readiness-nginx-container -- rm /tempdir/readiness-nginx
再觀察其狀態,可以發現
Name: readiness-nginx-deployment-57b7fd5644-7x7wc
Namespace: default
Priority: 0
Service Account: default
Node: ubuntuc/172.22.247.176
Start Time: Mon, 14 Aug 2023 14:35:27 +0000
Labels: app=readiness-nginxpod-template-hash=57b7fd5644
Annotations: cni.projectcalico.org/containerID: c475d3e82ff0d5adbd35252ab990608ad75955f8d0862bb8b0c54ee60a0878ebcni.projectcalico.org/podIP: 10.1.43.223/32cni.projectcalico.org/podIPs: 10.1.43.223/32
Status: Running
IP: 10.1.43.223
IPs:IP: 10.1.43.223
Controlled By: ReplicaSet/readiness-nginx-deployment-57b7fd5644
Containers:readiness-nginx-container:Container ID: containerd://5d82d8467bc6e0c8151e40ee3258d54bffec8659bcdad4a441848ea8f77a3223Image: nginxImage ID: docker.io/library/nginx@sha256:67f9a4f10d147a6e04629340e6493c9703300ca23a2f7f3aa56fe615d75d31caPort: 80/TCPHost Port: 0/TCPCommand:/bin/sh-csleep 3; touch /tempdir/readiness-nginx; while true; do sleep 5; doneState: RunningStarted: Mon, 14 Aug 2023 14:35:30 +0000Ready: FalseRestart Count: 0Readiness: exec [cat /tempdir/readiness-nginx] delay=2s timeout=1s period=1s #success=1 #failure=6Environment: <none>Mounts:/tempdir from probe-volume (rw)/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-c4tcl (ro)
Conditions:Type StatusInitialized True Ready False ContainersReady False PodScheduled True
Volumes:probe-volume:Type: EmptyDir (a temporary directory that shares a pod's lifetime)Medium: MemorySizeLimit: 1Gikube-api-access-c4tcl:Type: Projected (a volume that contains injected data from multiple sources)TokenExpirationSeconds: 3607ConfigMapName: kube-root-ca.crtConfigMapOptional: <nil>DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300snode.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:Type Reason Age From Message---- ------ ---- ---- -------Warning Unhealthy 7s (x22 over 6m6s) kubelet Readiness probe failed: cat: /tempdir/readiness-nginx: No such file or directory
可以看到Ready和ContainersReady都變成了False狀態。
我們再觀察Service
kubectl describe endpoints readiness-nginx-service
Name: readiness-nginx-service
Namespace: default
Labels: <none>
Annotations: endpoints.kubernetes.io/last-change-trigger-time: 2023-08-14T14:41:18Z
Subsets:Addresses: 10.1.209.155NotReadyAddresses: 10.1.43.223Ports:Name Port Protocol---- ---- --------<unset> 80 TCPEvents: <none>
可以看到被刪除了就緒探針檢測文件的Pod被從Service中摘掉了。
非就緒->就緒
我們再將檢測文件還原
kubectl exec pods/readiness-nginx-deployment-57b7fd5644-7x7wc --container readiness-nginx-container -- touch /tempdir/readiness-nginx
觀察對應Pod的狀態,其Ready和ContainersReady又變成了True狀態。
Name: readiness-nginx-deployment-57b7fd5644-7x7wc
Namespace: default
Priority: 0
Service Account: default
Node: ubuntuc/172.22.247.176
Start Time: Mon, 14 Aug 2023 14:35:27 +0000
Labels: app=readiness-nginxpod-template-hash=57b7fd5644
Annotations: cni.projectcalico.org/containerID: c475d3e82ff0d5adbd35252ab990608ad75955f8d0862bb8b0c54ee60a0878ebcni.projectcalico.org/podIP: 10.1.43.223/32cni.projectcalico.org/podIPs: 10.1.43.223/32
Status: Running
IP: 10.1.43.223
IPs:IP: 10.1.43.223
Controlled By: ReplicaSet/readiness-nginx-deployment-57b7fd5644
Containers:readiness-nginx-container:Container ID: containerd://5d82d8467bc6e0c8151e40ee3258d54bffec8659bcdad4a441848ea8f77a3223Image: nginxImage ID: docker.io/library/nginx@sha256:67f9a4f10d147a6e04629340e6493c9703300ca23a2f7f3aa56fe615d75d31caPort: 80/TCPHost Port: 0/TCPCommand:/bin/sh-csleep 3; touch /tempdir/readiness-nginx; while true; do sleep 5; doneState: RunningStarted: Mon, 14 Aug 2023 14:35:30 +0000Ready: TrueRestart Count: 0Readiness: exec [cat /tempdir/readiness-nginx] delay=2s timeout=1s period=1s #success=1 #failure=6Environment: <none>Mounts:/tempdir from probe-volume (rw)/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-c4tcl (ro)
Conditions:Type StatusInitialized True Ready True ContainersReady True PodScheduled True
Volumes:probe-volume:Type: EmptyDir (a temporary directory that shares a pod's lifetime)Medium: MemorySizeLimit: 1Gikube-api-access-c4tcl:Type: Projected (a volume that contains injected data from multiple sources)TokenExpirationSeconds: 3607ConfigMapName: kube-root-ca.crtConfigMapOptional: <nil>DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300snode.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:Type Reason Age From Message---- ------ ---- ---- -------Warning Unhealthy 3m5s (x262 over 13m) kubelet Readiness probe failed: cat: /tempdir/readiness-nginx: No such file or directory
Service也重新將其加回來了。
Name: readiness-nginx-service
Namespace: default
Labels: <none>
Annotations: endpoints.kubernetes.io/last-change-trigger-time: 2023-08-14T14:48:23Z
Subsets:Addresses: 10.1.209.155,10.1.43.223NotReadyAddresses: <none>Ports:Name Port Protocol---- ---- --------<unset> 80 TCPEvents: <none>