目錄
引言
一、ETCD數據備份
(一)確定備份策略
(二)使用etcdctl工具進行備份
1.安裝etcdctl命令
2.設置ETCDCTL_API環境變量
(三)執行備份
二、數據還原
(一)創建新資源
(二)數據恢復
1.停止etcd服務和K8s集群的相關組件
2.備份當前數據
3.恢復數據
4.重啟服務
三、驗證效果
引言
在Kubernetes集群中,ETCD是一個至關重要的組件,負責存儲集群的狀態信息和配置數據。從集群的規格、配置到運行中的工作負載狀態,ETCD都承載著關鍵的數據。因此,對ETCD數據進行定期備份和恢復策略的制定,對于確保Kubernetes集群的可靠性與數據完整性至關重要。本文將詳細介紹如何在Kubernetes中執行ETCD數據的備份與恢復操作。
一、ETCD數據備份
(一)確定備份策略
在進行ETCD數據備份之前,首先需要確定備份策略。這包括確定備份的頻率、備份的存儲位置以及備份的保留周期等。建議定期進行ETCD數據備份,并在多個安全的位置進行存儲,以防止數據丟失。
(二)使用etcdctl工具進行備份
1.安裝etcdctl命令
下載etcdctl命令,etcdctl是etcd的命令行客戶端工具,用于與etcd集群進行交互
[root@master01 manifests]#cat etcd.yaml |grep image:image: k8s.gcr.io/etcd:3.4.13-0
#查看ETCD版本[root@master01 mnt]#wget https://github.com/etcd-io/etcd/releases/download/v3.4.13/etcd-v3.4.13-linux-amd64.tar.gz
#下載與版本對應的工具包
[root@master01 mnt]#ls
etcd-v3.4.13-linux-amd64.tar.gz
[root@master01 mnt]#tar xf etcd-v3.4.13-linux-amd64.tar.gz
[root@master01 mnt]#ls
etcd-v3.4.13-linux-amd64 etcd-v3.4.13-linux-amd64.tar.gz
[root@master01 mnt]#ls etcd-v3.4.13-linux-amd64
Documentation etcd etcdctl README-etcdctl.md README.md READMEv2-etcdctl.md
[root@master01 mnt]#mv etcd-v3.4.13-linux-amd64/etcdctl /usr/local/sbin/
[root@master01 mnt]#etcdctl version
etcdctl version: 3.4.13
API version: 3.4
2.設置ETCDCTL_API環境變量
ETCDCTL_API環境變量用于指定etcdctl與etcd集群交互時使用的API版本。從etcd v3開始,etcdctl默認使用v3 API。但如果你需要與舊版本的etcd集群交互,可能需要設置此環境變量
[root@master01 mnt]#echo "ETCDCTL_API=3" >> ~/.bashrc
[root@master01 mnt]#bash
[root@master01 mnt]#echo "$ETCDCTL_API"
3
(三)執行備份
[root@master01 mnt]#mkdir /opt/etcd/backup -p
#創建一個用于存放備份文件的目錄
[root@master01 mnt]#ETCDCTL_API=3 etcdctl --endpoints=https://127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key snapshot save /opt/etcd/backup/etcdbackup.db
{"level":"info","ts":1718807742.198743,"caller":"snapshot/v3_snapshot.go:119","msg":"created temporary db file","path":"/opt/etcd/backup/etcdbackup.db.part"}
{"level":"info","ts":"2024-06-19T22:35:42.238+0800","caller":"clientv3/maintenance.go:200","msg":"opened snapshot stream; downloading"}
{"level":"info","ts":1718807742.238828,"caller":"snapshot/v3_snapshot.go:127","msg":"fetching snapshot","endpoint":"https://127.0.0.1:2379"}
{"level":"info","ts":"2024-06-19T22:35:42.945+0800","caller":"clientv3/maintenance.go:208","msg":"completed snapshot read; closing"}
{"level":"info","ts":1718807742.9925601,"caller":"snapshot/v3_snapshot.go:142","msg":"fetched snapshot","endpoint":"https://127.0.0.1:2379","size":"17 MB","took":0.793473218}
{"level":"info","ts":1718807743.0122747,"caller":"snapshot/v3_snapshot.go:152","msg":"saved","path":"/opt/etcd/backup/etcdbackup.db"}
Snapshot saved at /opt/etcd/backup/etcdbackup.db
[root@master01 mnt]#ll -h /opt/etcd/backup/etcdbackup.db
-rw------- 1 root root 17M 6月 19 22:35 /opt/etcd/backup/etcdbackup.db#上述命令中,--endpoints指定etcd的訪問地址,--cacert、--cert、--key分別指定etcd的CA證書、客戶端證書和私鑰
[root@master01 mnt]#ETCDCTL_API=3 etcdctl --write-out=table snapshot status /opt/etcd/backup/etcdbackup.db
#以表格的形式輸出驗證快照
+----------+----------+------------+------------+
| HASH | REVISION | TOTAL KEYS | TOTAL SIZE |
+----------+----------+------------+------------+
| 9feffbe0 | 1544178 | 1131 | 17 MB |
二、數據還原
(一)創建新資源
[root@master01 ~]#kubectl run nginx --image=nginx:1.18.0
pod/nginx created
[root@master01 ~]#kubectl get pod
NAME READY STATUS RESTARTS AGE
nginx 1/1 Running 0 3s
#創建一個新的資源,如果還原備份之后,該資源消失,說明還原成功
(二)數據恢復
1.停止etcd服務和K8s集群的相關組件
在恢復之前,需要停止etcd服務和K8s集群的相關組件(如apiserver、controller-manager、scheduler等)。由于etcd是通過靜態Pod方式部署的,你可以通過重命名/etc/kubernetes/manifests/目錄來停止所有由該目錄下的YAML文件啟動的服務
[root@master01 ~]#mkdir /opt/backup/ -p
[root@master01 ~]#ls /opt/backup/
[root@master01 ~]#mv /etc/kubernetes/manifests/* /opt/backup/
[root@master01 ~]#kubectl get pod -n kube-system
The connection to the server 192.168.83.30:6443 was refused - did you specify the right host or port?
2.備份當前數據
在恢復之前,建議備份當前的etcd數據目錄,以防萬一恢復出現問題需要回滾。
[root@master01 ~]#mv /var/lib/etcd /var/lib/etcd.bck
3.恢復數據
使用etcdctl的snapshot restore命令從備份文件中恢復數據。指定恢復后的數據目錄和其他相關參數
[root@master01 ~]#ETCDCTL_API=3 etcdctl snapshot restore /opt/etcd/backup/etcdbackup.db --name etcd-master01 --data-dir /var/lib/etcd --initial-cluster etcd-master01=https://192.168.83.30:2380 --initial-cluster-token etcd-cluster-token --initial-advertise-peer-urls https://192.168.83.30:2380
{"level":"info","ts":1718892798.584346,"caller":"snapshot/v3_snapshot.go:296","msg":"restoring snapshot","path":"/opt/etcd/backup/etcdbackup.db","wal-dir":"/var/lib/etcd/member/wal","data-dir":"/var/lib/etcd","snap-dir":"/var/lib/etcd/member/snap"}
{"level":"info","ts":1718892798.9371617,"caller":"mvcc/kvstore.go:380","msg":"restored last compact revision","meta-bucket-name":"meta","meta-bucket-name-key":"finishedCompactRev","restored-compact-revision":1543520}
{"level":"info","ts":1718892798.9976206,"caller":"membership/cluster.go:392","msg":"added member","cluster-id":"29d76e10db177ccd","local-member-id":"0","added-peer-id":"a6f5c4f9af0db4c1","added-peer-peer-urls":["https://192.168.83.30:2380"]}
{"level":"info","ts":1718892799.0065427,"caller":"snapshot/v3_snapshot.go:309","msg":"restored snapshot","path":"/opt/etcd/backup/etcdbackup.db","wal-dir":"/var/lib/etcd/member/wal","data-dir":"/var/lib/etcd","snap-dir":"/var/lib/etcd/member/snap"}#----------------------------------------------------------------------------------------------------------
etcdctl snapshot restore 命令用于從 etcd 的快照文件中恢復數據。在你給出的命令中,有一些參數需要被替換為具體的值來匹配你的 etcd 集群配置。以下是每個參數的解釋和應該替換為什么:ETCDCTL_API=3
#這個環境變量告訴 etcdctl 使用 etcd API 的第 3 版本。通常,你可以直接在命令行前設置這個環境變量,或者在你的 shell 配置文件中設置它。etcdctl snapshot restore /opt/etcd/backup/etcdbackup.db
#/opt/etcd/backup/etcdbackup.db 是你要從中恢復數據的 etcd 快照文件的路徑。確保這個路徑是正確的,并且文件是可讀的。--name etcd-master01
#--name 參數定義了 etcd 實例的名稱。在你的例子中,它被設置為 etcd-master01。這通常與集群中的特定節點相關聯。--data-dir /var/lib/etcd
#--data-dir 參數指定了 etcd 存儲其數據的目錄。在恢復過程中,這個目錄將被用于存儲恢復的數據。確保這個目錄是可寫的,并且沒有重要的數據,因為恢復過程可能會覆蓋它。--initial-cluster etcd-master01=https://192.168.83.30:2380
#--initial-cluster 參數定義了 etcd 集群的初始成員列表。在恢復過程中,你需要指定集群中所有節點的名稱和它們的客戶端 URL。--initial-cluster-token etcd-cluster-token
#--initial-cluster-token 參數用于 etcd 集群中的節點在初次啟動時相互發現。它應該是一個唯一的字符串,用于你的 etcd 集群。確保所有節點在啟動時都使用相同的集群令牌。--initial-advertise-peer-urls https://192.168.83.30:2380
#--initial-advertise-peer-urls 參數指定了 etcd 節點在集群中用于通信的 URL。這通常是節點的對等體(peer)URL。
注意
如果你有多個 etcd 節點,你需要確保 --initial-cluster 參數包含所有節點的信息,并且每個節點的 --name、--initial-advertise-peer-urls 和其他相關參數都是正確配置的。
在恢復數據之前,最好先備份現有的 etcd 數據目錄(如果有的話),以防萬一恢復過程出現問題。
確保 etcd 集群中的所有節點都已正確配置,并且網絡是通暢的,以便節點之間可以相互通信。
4.重啟服務
將/etc/kubernetes/manifests/目錄的名字改回原樣,以重啟K8s集群的相關組件。
[root@master01 ~]#mv /opt/backup/* /etc/kubernetes/manifests/
[root@master01 ~]#kubectl get pod -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-74ff55c5b-dwzdp 1/1 Running 10 35d
coredns-74ff55c5b-ws8c8 1/1 Running 10 35d
etcd-master01 1/1 Running 10 24h
kube-apiserver-master01 1/1 Running 5 24h
kube-controller-manager-master01 1/1 Running 45 24h
kube-proxy-58zbl 1/1 Running 0 4d7h
kube-proxy-9v7jw 1/1 Running 0 4d7h
kube-proxy-xdgb4 1/1 Running 0 4d7h
kube-scheduler-master01 1/1 Running 48 24h
[root@master01 ~]#kubectl get pod
No resources found in default namespace.
#之前創建的nginx的pod資源消失,其它服務則正常運行,證明數據恢復到備份狀態
三、驗證效果
1.刪除svc資源
[root@master01 ~]#kubectl get svc -A
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 35d
kube-system service/kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 35d
[root@master01 ~]#kubectl delete service kubernetes
service "kubernetes" deleted
[root@master01 ~]#kubectl get svc -A
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kube-system service/kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 35d
2.停止ETCD資源與k8s各組件
[root@master01 ~]#mv /etc/kubernetes/manifests/* /opt/backup/
[root@master01 ~]#kubectl get all -A
The connection to the server 192.168.83.30:6443 was refused - did you specify the right host or port?
The connection to the server 192.168.83.30:6443 was refused - did you specify the right host or port?
The connection to the server 192.168.83.30:6443 was refused - did you specify the right host or port?
3.數據恢復
[root@master01 ~]#rm -rf /var/lib/etcd
#因為之前有備份,可以選擇刪除,如果數據有變動,則需要重新備份
[root@master01 ~]#ETCDCTL_API=3 etcdctl snapshot restore /opt/etcd/backup/etcdbackup.db --name etcd-master01 --data-dir /var/lib/etcd --initial-cluster etcd-master01=https://192.168.83.30:2380 --initial-cluster-token etcd-cluster-token --initial-advertise-peer-urls https://192.168.83.30:2380
{"level":"info","ts":1718893339.6779017,"caller":"snapshot/v3_snapshot.go:296","msg":"restoring snapshot","path":"/opt/etcd/backup/etcdbackup.db","wal-dir":"/var/lib/etcd/member/wal","data-dir":"/var/lib/etcd","snap-dir":"/var/lib/etcd/member/snap"}
{"level":"info","ts":1718893339.8944564,"caller":"mvcc/kvstore.go:380","msg":"restored last compact revision","meta-bucket-name":"meta","meta-bucket-name-key":"finishedCompactRev","restored-compact-revision":1543520}
{"level":"info","ts":1718893339.9436255,"caller":"membership/cluster.go:392","msg":"added member","cluster-id":"29d76e10db177ccd","local-member-id":"0","added-peer-id":"a6f5c4f9af0db4c1","added-peer-peer-urls":["https://192.168.83.30:2380"]}
{"level":"info","ts":1718893339.9502227,"caller":"snapshot/v3_snapshot.go:309","msg":"restored snapshot","path":"/opt/etcd/backup/etcdbackup.db","wal-dir":"/var/lib/etcd/member/wal","data-dir":"/var/lib/etcd","snap-dir":"/var/lib/etcd/member/snap"}
[root@master01 ~]#mv /opt/backup/* /etc/kubernetes/manifests/
[root@master01 ~]#kubectl get all -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-flannel pod/kube-flannel-ds-8sgt8 1/1 Running 1 23d
kube-flannel pod/kube-flannel-ds-nplmm 1/1 Running 12 35d
kube-flannel pod/kube-flannel-ds-xwklx 1/1 Running 3 23d
kube-system pod/coredns-74ff55c5b-dwzdp 1/1 Running 10 35d
kube-system pod/coredns-74ff55c5b-ws8c8 1/1 Running 10 35d
kube-system pod/etcd-master01 1/1 Running 0 25h
kube-system pod/kube-apiserver-master01 1/1 Running 0 25h
kube-system pod/kube-controller-manager-master01 1/1 Running 0 25h
kube-system pod/kube-proxy-58zbl 1/1 Running 0 4d7h
kube-system pod/kube-proxy-9v7jw 1/1 Running 0 4d7h
kube-system pod/kube-proxy-xdgb4 1/1 Running 0 4d7h
kube-system pod/kube-scheduler-master01 1/1 Running 0 25hNAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 35d
kube-system service/kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 35dNAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
kube-flannel daemonset.apps/kube-flannel-ds 3 3 3 3 3 <none> 35d
kube-system daemonset.apps/kube-proxy 3 3 3 3 3 kubernetes.io/os=linux 35dNAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE
kube-system deployment.apps/coredns 2/2 2 2 35dNAMESPACE NAME DESIRED CURRENT READY AGE
kube-system replicaset.apps/coredns-74ff55c5b 2 2 2 35d