1.查看集群狀態? 獲取主節點和故障節點id
ETCDCTL_API=3 ./etcdctl --cacert=/etc/kubernetes/ssl/new-ca.pem --cert=/etc/kubernetes/ssl/etcd.pem --key=/etc/kubernetes/ssl/etcd-key.pem --endpoints="https://192.168.7.132:2379,https://192.168.7.134:2379,https://192.168.7.135:2379" endpoint status cluster-health --write-out=table
ETCDCTL_API=3 ./etcdctl --cacert=/etc/kubernetes/ssl/new-ca.pem --cert=/etc/kubernetes/ssl/etcd.pem --key=/etc/kubernetes/ssl/etcd-key.pem --endpoints="https://192.168.7.132:2379,https://192.168.7.134:2379,https://192.168.7.135:2379" endpoint status cluster-health --write-out=table
?查看故障節點
ETCDCTL_API=3 ./etcdctl --cacert=/etc/kubernetes/ssl/new-ca.pem --cert=/etc/kubernetes/ssl/etcd.pem --key=/etc/kubernetes/ssl/etcd-key.pem --endpoints="https://192.168.7.132:2379,https://192.168.7.134:2379,https://192.168.7.135:2379" endpointhealth --write-out=table
ETCDCTL_API=3 ./etcdctl --cacert=/etc/kubernetes/ssl/new-ca.pem --cert=/etc/kubernetes/ssl/etcd.pem --key=/etc/kubernetes/ssl/etcd-key.pem --endpoints="https://192.168.7.132:2379,https://192.168.7.134:2379,https://192.168.7.135:2379" endpointhealth --write-out=table
2.剔除故障節點?
ETCDCTL_API=3 etcdctl --endpoints="https://172.16.169.82:2379" --cacert=/etc/etcd/ssl/ca.pem --cert=/etc/etcd/ssl/etcd.pem --key=/etc/etcd/ssl/etcd-key.pem member remove 971a0fee3d275c5
ETCDCTL_API=3 etcdctl --endpoints="https://172.16.169.82:2379" --cacert=/etc/etcd/ssl/ca.pem --cert=/etc/etcd/ssl/etcd.pem --key=/etc/etcd/ssl/etcd-key.pem member remove 971a0fee3d275c5
驗證是否剔除成功
ETCDCTL_API=3 ./etcdctl --cacert=/etc/kubernetes/ssl/new-ca.pem --cert=/etc/kubernetes/ssl/etcd.pem --key=/etc/kubernetes/ssl/etcd-key.pem --endpoints="https://192.168.7.132:2379,https://192.168.7.134:2379,https://192.168.7.135:2379" member list --write-out=table
ETCDCTL_API=3 ./etcdctl --cacert=/etc/kubernetes/ssl/new-ca.pem --cert=/etc/kubernetes/ssl/etcd.pem --key=/etc/kubernetes/ssl/etcd-key.pem --endpoints="https://192.168.7.132:2379,https://192.168.7.134:2379,https://192.168.7.135:2379" member list --write-out=table
3.重新添加故障節點
3.1 檢查舊etcd數據并清空?
rm -rf etcd3.etcd/
3.2 修改要加入集群etcd3的啟動參數
vi /etc/etcd/etcd.conf并保存
將etcd的--initial-cluster-state啟動參數,改為--initial-cluster-state=existing
3.3 從etcd-master拷貝證書
scp -rp etcd1:/etc/etcd/ssl/ ?/etc/etcd/
拷貝后檢查etcd證書所屬用戶
3.4 重新添加故障節點etcd3
注意:etcd3為etcd.conf中ETCD_NAME
ETCDCTL_API=3 etcdctl --endpoints="https://172.16.169.82:2379" --cacert=/etc/etcd/ssl/ca.pem --cert=/etc/etcd/ssl/etcd.pem --key=/etc/etcd/ssl/etcd-key.pem ?member add etcd3 --peer-urls=https://172.16.169.83:2380
3.5 重啟服務驗證成功
systemctl daemon-reload
systemctl restart etcd
?
二. 備份恢復方式重做集群
備注:此方式有可能導致數據丟失,親身經歷,盡量使用第一種方式。
備份ETCD集群時,只需要備份一個ETCD數據,然后同步到其他節點上
恢復ETCD數據時,每個節點都要執行恢復命令
恢復順序:停止kube-apiserver –> 停止ETCD –> 恢復數據 –> 啟動ETCD –> 啟動kube-apiserve
etcd資源文件在/var/lib/etcd下,有兩個文件夾
- snap:存放快照數,etcd防止WAL文件過多而設置的快照,存儲etcd數據狀態
- wal:存放預寫式日志,最大的作用是記錄了整個數據變化的全部歷程。在etcd中,所有數據的修改在提交前,都要先寫入到WAL中
1、查看健康節點
ETCDCTL_API=3 /opt/etcd/bin/etcdctl --cacert=/opt/etcd/ssl/etcd-ca.pem --cert=/opt/etcd/ssl/server.pem --key=/opt/etcd/ssl/server-key.pem --endpoints="https://192.168.15.215:2379,https://192.168.15.216:2379,https://192.168.15.217:2379" endpoint health --write-out=table
2、在健康節點上備份etcd數據
ETCDCTL_API=3 etcdctl snapshot save ?backup.db \
--cacert=/opt/etcd/ssl/etcd-ca.pem \
--cert=/opt/etcd/ssl/server.pem \
--key=/opt/etcd/ssl/server-key.pem \
--endpoints="https://192.168.15.215:2379"
3、將備份后的文件復制到其他etcd節點下
for i in {etcd2,etcd3};do scp backup.db root@i:/root ;done
4、恢復etcd數據
恢復之前需要先關閉apiserver和etcd,刪除etcd數據文件
systemctl stop kube-apiserver ?#停止apiserver
systemctl stop etcd ? ? ? ? ? ?#停止etcd
rm -rf opt/etcd/data/* ? ? ? ? #刪除數據目錄文件,根據實際情況
在節點1執行恢復命令,如下:
ETCDCTL_API=3 etcdctl snapshot restore backup.db \
--name etcd-1 \
--initial-cluster="etcd-1=https://192.168.15.215:2380,etcd-2=https://192.168.15.216:2380,etcd-3=https://192.168.15.217:2380" \
--initial-cluster-token=etcd-cluster \
--initial-advertise-peer-urls=https://192.168.15.215:2380 \
--data-dir=/opt/etcd/data
在節點2執行恢復命令,如下:
ETCDCTL_API=3 etcdctl snapshot restore backup.db \
--name etcd-2 \
--initial-cluster="etcd-1=https://192.168.15.215:2380,etcd-2=https://192.168.15.216:2380,etcd-3=https://192.168.15.217:2380" \
--initial-cluster-token=etcd-cluster \
--initial-advertise-peer-urls=https://192.168.15.216:2380 \
--data-dir=/opt/etcd/data
在節點3執行恢復命令,如下:
ETCDCTL_API=3 etcdctl snapshot restore backup.db \
--name etcd-3 \
--initial-cluster="etcd-1=https://192.168.15.215:2380,etcd-2=https://192.168.15.216:2380,etcd-3=https://192.168.15.217:2380" \
--initial-cluster-token=etcd-cluster \
--initial-advertise-peer-urls=https://192.168.15.217:2380 \
--data-dir=/opt/etcd/data
恢復完成后,重啟kube-apiserver和etcd,如下:
systemctl start kube-apiserver
systemctl start etcd
5、查看集群狀
ETCDCTL_API=3 /opt/etcd/bin/etcdctl --cacert=/opt/etcd/ssl/etcd-ca.pem --cert=/opt/etcd/ssl/server.pem --key=/opt/etcd/ssl/server-key.pem --endpoi