前言
宿主機ping不通K8S的pod,一直存在丟包的現象,排查了防火墻、日志、詳細信息等沒發現什么問題,最后搜索發現,是因為把K8S的版本升級之后,舊版本的CNI插件不適配原因導致的,于是就把calico也一并升級并且記錄下來。本文K8S版本為1.32.6,calico版本為3.30.2。
一、刪除舊版本的CNI名稱空間
因為這樣可以直接刪除所有資源。
建議大家還是直接使用kubectl delete -f xxx.yaml或者是delete資源。
這里我直接使用etcdctl命令行工具刪除了。(不要輕易模仿)
# 安裝命令行工具[root@k8s-master ~]# apt install -y etcd# 過濾出我們的名稱空間的信息
[root@k8s-master ~]# ETCDCTL_API=3 etcdctl --endpoints=https://127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key get / --prefix --keys-only | grep namespace# 找到calico所對應的名稱空間,進行刪除操作
[root@k8s-master ~]# ETCDCTL_API=3 etcdctl --endpoints=https://127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key del /registry/namespaces/calico-apiserver[root@k8s-master ~]# ETCDCTL_API=3 etcdctl --endpoints=https://127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key del /registry/namespaces/calico-system
當然當然當然,刪除pods也只需要把grep后面的信息修改為pod,然后del就行了!
二、重置coredns組件
[root@k8s-master ~]# kubectl delete po -n kube-system -l k8s-app=kube-dns
三、下載新版本的CNI插件
wget https://raw.githubusercontent.com/projectcalico/calico/v3.30.2/manifests/operator-crds.yamlwget https://raw.githubusercontent.com/projectcalico/calico/v3.30.2/manifests/tigera-operator.yaml
四、清空Finalizer
1. 查看當前狀態(可選)
確認它確實在刪除中,并帶有哪些 finalizer:
kubectl get installation default -n calico-system -o yaml | grep -E 'deletionTimestamp|finalizers' -C2
2.?移除所有 finalizer
執行下面的 patch 命令,將 metadata.finalizers
數組清空:
kubectl patch installation default -n calico-system \--type=merge \-p '{"metadata":{"finalizers":[]}}'
3.?確認刪除標記被清除
再次查看資源,應該不再有 deletionTimestamp
,也不再有 finalizer:
kubectl get installation default -n calico-system -o yaml | grep -E 'deletionTimestamp|finalizers' -C2
這樣 Installation/default
就會從“刪除中”回到正常狀態,Operator 也會繼續對它進行管理。
五、部署新版本的CNI插件
kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.30.2/manifests/tigera-operator.yaml
wget https://raw.githubusercontent.com/projectcalico/calico/v3.30.2/manifests/custom-resources.yaml
這里需要注意,記得修改pod的cidr網段!(之前初始化K8S集群的時候設置的pod網段)
[root@k8s-master ~]# grep -C 2 'cidr' custom-resources.yaml - name: default-ipv4-ippoolblockSize: 26cidr: 10.100.0.0/16encapsulation: VXLANCrossSubnetnatOutgoing: Enabled
完成了之后在apply創建資源
[root@k8s-master ~]# kubectl apply -f custom-resources.yaml
等待資源創建成功
[root@k8s-master ~]# kubectl get pod -A
NAMESPACE NAME READY STATUS RESTARTS AGE
calico-apiserver calico-apiserver-67fd59c4cb-p8qm8 1/1 Running 0 6m54s
calico-apiserver calico-apiserver-67fd59c4cb-v6qst 1/1 Running 0 2m47s
calico-system calico-kube-controllers-598d796659-5pnz9 1/1 Running 0 6m51s
calico-system calico-node-4nfmc 1/1 Running 0 6m52s
calico-system calico-node-n8l9r 1/1 Running 0 6m52s
calico-system calico-node-rqcsp 1/1 Running 0 6m52s
calico-system calico-typha-76846cfc98-fzsq4 1/1 Running 0 6m50s
calico-system calico-typha-76846cfc98-s9v7k 1/1 Running 0 6m52s
calico-system csi-node-driver-l9l6m 2/2 Running 0 6m52s
calico-system csi-node-driver-v6f7m 2/2 Running 0 6m52s
calico-system csi-node-driver-xrd4m 2/2 Running 0 6m52s
calico-system goldmane-5f56496f4c-69p7x 1/1 Running 0 6m52s
calico-system whisker-85957d9c7b-ckxw7 2/2 Running 0 6m52s
kube-system coredns-6766b7b6bb-lqmgs 1/1 Running 0 8s
kube-system coredns-6766b7b6bb-qqhg8 1/1 Running 0 72s
kube-system etcd-k8s-master 1/1 Running 3 (137m ago) 173m
kube-system kube-apiserver-k8s-master 1/1 Running 4 (135m ago) 173m
kube-system kube-controller-manager-k8s-master 1/1 Running 3 (137m ago) 173m
kube-system kube-proxy-7rg9h 1/1 Running 1 (137m ago) 138m
kube-system kube-proxy-tkpgx 1/1 Running 1 (137m ago) 138m
kube-system kube-proxy-vpjcw 1/1 Running 1 (137m ago) 138m
kube-system kube-scheduler-k8s-master 1/1 Running 3 (137m ago) 173m
tigera-operator tigera-operator-747864d56d-9bdfv 1/1 Running 0 7m3s
查看版本信息
[root@k8s-master ~]# kubectl -n calico-system get daemonset calico-node -o jsonpath="{.spec.template.spec.containers[0].image}";echo
docker.io/calico/node:v3.30.2
六、驗證集群與CNI可用性?
1. 創建Pod
[root@k8s-master ~]# cat test-cni.yaml
apiVersion: v1
kind: Pod
metadata:name: xiuxian-v1
spec:containers:- image: registry.cn-hangzhou.aliyuncs.com/yinzhengjie-k8s/apps:v1 name: xiuxian
2. 驗證
[root@k8s-master ~]# kubectl get po -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
xiuxian-v1 1/1 Running 0 9s 10.100.169.135 k8s-node2 <none> <none>
[root@k8s-master ~]# curl 10.100.169.135
<!DOCTYPE html>
<html><head><meta charset="utf-8"/><title>yinzhengjie apps v1</title><style>div img {width: 900px;height: 600px;margin: 0;}</style></head><body><h1 style="color: green">凡人修仙傳 v1 </h1><div><img src="1.jpg"><div></body></html>
3. ping測試
[root@k8s-master ~]# ping 10.100.169.135
PING 10.100.169.135 (10.100.169.135) 56(84) bytes of data.
64 bytes from 10.100.169.135: icmp_seq=1 ttl=63 time=0.509 ms
64 bytes from 10.100.169.135: icmp_seq=2 ttl=63 time=0.374 ms
^C
--- 10.100.169.135 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1007ms
rtt min/avg/max/mdev = 0.374/0.441/0.509/0.067 ms
通過以上排查和調整,大多數 Calico CNI 相關網絡問題都能迎刃而解。希望這篇技術博文能為你和同樣在生產環境中苦于網絡抖動的同行,提供一份全面的排障指南。