一、版本兼容性和服務器規劃
組件 | 版本/配置信息 | 備注 |
---|---|---|
操作系統 | Anolis OS 8.9 | 基于 Linux 5.10.134-17.3.an8.x86_64 |
內核版本 | Linux 5.10.134-17.3.an8.x86_64 | 與 Kubernetes 1.29 兼容 |
架構 | x86-64 | |
Kubernetes 版本 | v1.29.5 | 最新穩定版,兼容 Linux 5.10 內核 |
Docker 版本 | 24.0.7 | 需要配置 systemd Cgroup 驅動 |
Calico 版本 | v3.27.3 | 支持 Kubernetes 1.29,適配 x86-64 架構 |
Dashboard 版本 | v2.7.0 | 最新版本 |
Master 服務器 IP | 192.168.153.200 | 主節點 |
Node1 服務器 IP | 192.168.153.201 | 工作節點1 |
Node2 服務器 IP | 192.168.153.202 | 工作節點2 |
二、環境準備(所有節點執行)
1、修改hosts文件,設置主機名
(只master節點上執行)
hostnamectl set-hostname master
(只node1節點執行)
hostnamectl set-hostname node1
(只node2節點執行)
hostnamectl set-hostname node22、關閉防火墻和SELinux
sudo systemctl disable --now firewalld
sudo setenforce 0
sudo sed -i 's/^SELINUX=.*/SELINUX=disabled/g' /etc/selinux/config3、禁用Swap
sudo swapoff -a
sudo sed -ri '/swap/s/^/#/' /etc/fstab4、配置內核參數
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
EOF
sudo sysctl --system5、# 配置主機名解析(所有節點)
sudo tee -a /etc/hosts <<EOF
192.168.153.200 k8s-endpoint
192.168.153.200 master
192.168.153.201 node1
192.168.153.202 node2
EOF6、# 時間同步
sudo dnf install chrony -y
sudo systemctl enable --now chronyd7、#給終端配置顏色添加時間(個人習慣)
1.打開 ~/.bashrc 文件:
vim ~/.bashrc
2.找到或添加以下行來設置 PS1 變量(這是定義提示符的變量):
export PS1='\[\e[0;92m\][\u@\h \t]# \[\e[0m\]'
3. 保存并關閉文件。
:wq
4.使更改生效:
source ~/.bashrc
三、安裝Docker(所有節點)
#卸載Podman及相關組件,強制移除所有Podman相關包(容易起沖突)
sudo dnf remove podman buildah skopeo catatonit --nobest -y
# 清理殘留依賴
sudo dnf autoremove
# 清理舊緩存
sudo dnf clean all
# Docker源(阿里云加速)
sudo dnf config-manager --add-repo https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
#安裝docker指定版本
sudo dnf install -y docker-ce-24.0.7 docker-ce-cli-24.0.7 containerd.io# 配置Docker參數
sudo mkdir /etc/docker
cat <<EOF | sudo tee /etc/docker/daemon.json
{"exec-opts": ["native.cgroupdriver=systemd"],"registry-mirrors": ["https://docker.mirrors.ustc.edu.cn/"]
}
EOF#設置開機自啟動
sudo systemctl enable --now docker
四、安裝Kubernetes組件(所有節點)
# 添加Kubernetes源
sudo tee /etc/yum.repos.d/kubernetes.repo <<EOF
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes-new/core/stable/v1.29/rpm/
enabled=1
gpgcheck=1
gpgkey=https://mirrors.aliyun.com/kubernetes-new/core/stable/v1.29/rpm/repodata/repomd.xml.key
EOF# 安裝kubeadm、kubelet、kubectl(指定版本)
sudo dnf install -y kubelet-1.29.5 kubeadm-1.29.5 kubectl-1.29.5# 設置kubelet開機啟動(暫不啟動)
sudo systemctl enable kubelet
五、初始化Master節點(僅Master執行)
#初始化命令
sudo kubeadm init
--apiserver-advertise-address=192.168.153.200
--control-plane-endpoint=192.168.153.200
--image-repository registry.cn-hangzhou.aliyuncs.com/google_containers
--kubernetes-version v1.29.5
--service-cidr=10.96.0.0/16
--pod-network-cidr=172.20.0.0/16# 配置kubectl
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
(如果初始化報錯)檢查并修復 containerd 配置(所有節點)
步驟1、檢查并修復 containerd 配置
#移動配置文件 containerd 默認配置文件(若存在舊配置沖突):
sudo mv /etc/containerd/config.toml /root/config.toml
#重新生成 containerd 配置文件:
containerd config default | sudo tee /etc/containerd/config.toml步驟2、#啟用 CRI 插件: 編輯 /etc/containerd/config.toml,確保以下配置存在:
vim /etc/containerd/config.toml
更換為阿里源:
sandbox_image = "registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.6"步驟3、重啟 containerd 服務:
sudo systemctl restart containerd步驟4、驗證 containerd 服務狀態
sudo systemctl status containerd
# 輸出應包含 "Active: active (running)"步驟5、Kubernetes 1.29.5 要求 containerd ≥1.6.0 版本。通過以下命令驗證版本:
containerd --version
六、節點加入集群(node1/node2執行)
#使用主節點初始化完成后生成的kubeadm join命令,例如:
kubeadm join k8s-endpoint:6443 --token xxxx.xxxxxxxxxxxx \--discovery-token-ca-cert-hash sha256:xxxxxxxx...#檢查當前的令牌:(在Master執行)
kubeadm token list
#這將列出現有的令牌。如果沒有有效的令牌,或者需要生成新的令牌,可以繼續執行下面的步驟。#生成新的令牌(如果沒有令牌或令牌已過期):(在Master執行)
kubeadm token create --print-join-command
七、部署Calico網絡插件(僅Master執行)
# 使用阿里云鏡像源適配版本
curl -O https://raw.githubusercontent.com/projectcalico/calico/v3.27.3/manifests/calico.yaml# 修改CIDR配置(與kubeadm參數一致)
sed -i 's/192.168.0.0\/16/172.20.0.0\/16/' calico.yaml#切換到阿里云的鏡像源
sed -i 's|docker.io/calico/|registry.aliyuncs.com/calico/|g' calico.yaml#如果阿里云不行,可嘗試這個國內鏡像源swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/calico
sed -i 's|registry.aliyuncs.com/calico/|swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/calico/|g' calico.yaml
[root@master 11:36:19]# cat calico.yaml | grep image:image: swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/calico/cni:v3.27.3image: swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/calico/cni:v3.27.3image: swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/calico/node:v3.27.3image: swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/calico/node:v3.27.3image: swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/calico/kube-controllers:v3.27.3#執行配置文件
kubectl apply -f calico.yaml# 驗證網絡狀態
[root@master 11:40:14]# kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
calico-kube-controllers-6b44cbc54d-fdxjk 1/1 Running 0 109m
calico-node-7q9pl 1/1 Running 0 109m
calico-node-kwtrj 1/1 Running 0 109m
calico-node-vwskq 1/1 Running 0 109m
coredns-5f98f8d567-5lldq 1/1 Running 0 123m
coredns-5f98f8d567-j5874 1/1 Running 0 123m
etcd-master 1/1 Running 0 123m
kube-apiserver-master 1/1 Running 0 123m
kube-controller-manager-master 1/1 Running 1 (115m ago) 123m
kube-proxy-96xr5 1/1 Running 0 123m
kube-proxy-f9wl6 1/1 Running 0 120m
kube-proxy-sqfrh 1/1 Running 0 118m
kube-scheduler-master 1/1 Running 1 (115m ago) 123m
#查看節點狀態
[root@master 11:44:38]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
master Ready control-plane 124m v1.29.5
node1 Ready <none> 121m v1.29.5
node2 Ready <none> 118m v1.29.5
八、Dashboard部署(僅在master執行)
#下載官網yaml文件
wget https://raw.githubusercontent.com/kubernetes/dashboard/v2.7.0/aio/deploy/recommended.yaml#修改Service為NodePort ,指定服務的類型為 NodePort,意味著這個服務可以通過節點的 IP 和指定的端口對外暴露。
#并添加nodePort: 30001 ,這是節點暴露的端口,外部訪問時會通過這個端口。
vim recommended.yaml
[root@master 13:20:28]# grep -A 7 'spec:' recommended.yaml | head -n 8
spec:type: NodePortports:- port: 443targetPort: 8443nodePort: 30001selector:k8s-app: kubernetes-dashboard#image替換為阿里源,registry.cn-hangzhou.aliyuncs.com/google_containers
[root@master 13:09:40]# cat recommended.yaml | grep image:image: registry.cn-hangzhou.aliyuncs.com/google_containers/dashboard:v2.7.0image: registry.cn-hangzhou.aliyuncs.com/google_containers/metrics-scraper:v1.0.8
#如果阿里云不可用 嘗試這個國內鏡像源swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/kubernetesui
[root@master 11:45:26]# cat recommended.yaml | grep image:image: swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/kubernetesui/dashboard:v2.7.0image: swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/kubernetesui/metrics-scraper:v1.0.8# 部署
kubectl apply -f recommended.yaml#查看所有pod 狀態。
[root@master 12:59:29]# kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-kube-controllers-6b44cbc54d-fdxjk 1/1 Running 0 3h4m
kube-system calico-node-7q9pl 1/1 Running 0 3h4m
kube-system calico-node-kwtrj 1/1 Running 0 3h4m
kube-system calico-node-vwskq 1/1 Running 0 3h4m
kube-system coredns-5f98f8d567-5lldq 1/1 Running 0 3h18m
kube-system coredns-5f98f8d567-j5874 1/1 Running 0 3h18m
kube-system etcd-master 1/1 Running 0 3h18m
kube-system kube-apiserver-master 1/1 Running 0 3h18m
kube-system kube-controller-manager-master 1/1 Running 1 (3h10m ago) 3h18m
kube-system kube-proxy-96xr5 1/1 Running 0 3h18m
kube-system kube-proxy-f9wl6 1/1 Running 0 3h15m
kube-system kube-proxy-sqfrh 1/1 Running 0 3h13m
kube-system kube-scheduler-master 1/1 Running 1 (3h10m ago) 3h18m
kubernetes-dashboard dashboard-metrics-scraper-bd84c9d8b-x2gmj 1/1 Running 0 136m
kubernetes-dashboard kubernetes-dashboard-5cc694d9b-825gq 1/1 Running 0 136m#查看 kubernetes-dashboard 命名空間下資源狀態
kubectl get pods,svc -n kubernetes-dashboard
[root@master 12:59:29]# kubectl get pods,svc -n kubernetes-dashboard
NAME READY STATUS RESTARTS AGE
pod/dashboard-metrics-scraper-bd84c9d8b-x2gmj 1/1 Running 0 136m
pod/kubernetes-dashboard-5cc694d9b-825gq 1/1 Running 0 136mNAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/dashboard-metrics-scraper ClusterIP 10.96.100.159 <none> 8000/TCP 136m
service/kubernetes-dashboard NodePort 10.96.67.20 <none> 443:30001/TCP 136m1、獲取節點 IP
kubectl get nodes -o wide2、訪問地址,瀏覽器輸入(注意使用 HTTPS):
https://192.168.153.200:30001
#繞過證書警告(開發環境)
Chrome:在頁面任意位置輸入 thisisunsafe(無需回車)。
Firefox:點擊 高級 -> 接受風險并繼續。3、創建管理員賬號(若未提前創建):
kubectl create serviceaccount dashboard-admin -n kubernetes-dashboard
kubectl create clusterrolebinding dashboard-admin --clusterrole=cluster-admin --serviceaccount=kubernetes-dashboard:dashboard-admin4、獲取 Token
kubectl -n kubernetes-dashboard get secret dashboard-admin-token -o go-template='{{.data.token | base64decode}}'
獲取 Token亂碼報錯解決方法:
#步驟 1:確認 ServiceAccount 關聯的 Secret
[root@master 12:59:31]# kubectl -n kubernetes-dashboard describe sa dashboard-admin
Name: dashboard-admin
Namespace: kubernetes-dashboard
Labels: <none>
Annotations: <none>
Image pull secrets: <none>
Mountable secrets: <none>
Tokens: dashboard-admin-token
Events: <none>#步驟 2:手動創建 Secret 并關聯 Token(若未生成 Secret,需手動創建:)
kubectl apply -f - <<EOF
apiVersion: v1
kind: Secret
metadata:name: dashboard-admin-tokennamespace: kubernetes-dashboardannotations:kubernetes.io/service-account.name: dashboard-admin
type: kubernetes.io/service-account-token
EOF#步驟 3:驗證 Secret 內容,確保 Secret 包含 token 字段:
kubectl -n kubernetes-dashboard get secret dashboard-admin-token -o jsonpath='{.data.token}'
#若輸出為 null,刪除舊 Secret 并重新創建:
kubectl -n kubernetes-dashboard delete secret dashboard-admin-token
kubectl apply -f <上述 YAML 文件>步驟 4:獲取 Token
kubectl -n kubernetes-dashboard get secret dashboard-admin-token -o go-template='{{.data.token | base64decode}}'
[root@master 12:59:30]# kubectl -n kubernetes-dashboard get secret dashboard-admin-token -o go-template='{{.data.token | base64decode}}'
eyJhbGciOiJSUzI1NiIsImtpZCI6IlBINlNjODNwR1duZWR4TWVfV3pkRWZsTG1UUzJxZGRTb1pyTHBrNkFZRUUifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlcm5ldGVzLWRhc2hib2FyZCIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJkYXNoYm9hcmQtYWRtaW4tdG9rZW4iLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoiZGFzaGJvYXJkLWFkbWluIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQudWlkIjoiOWVhMWY0OGUtZDNjYS00ZjViLWEzODAtN2U3MjE3MGRiYTdmIiwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50Omt1YmVybmV0ZXMtZGFzaGJvYXJkOmRhc2hib2FyZC1hZG1pbiJ9.tyg0bnwtzbdR7bqGOPZ8ZgkfDvVd9ZpSoeDCX1qbMM4SgUu6mdlbd5UTNdR-Yq6e-F3TzVHfobkSfWvLBumdRjTPj9qvDedzPhl2nB8vdx2VNE4dvkyJ_OlB3MqJdFH9wuzU93ovRbbOULjTnTm2AOUWck1eJFw8YVmbgHmx4xnfLSlcSFOIbeJmhm1rPGZlsRDQgIlcnVAhPkPpuBO21wrLtzQwL0D6aVGxRaNXQMhlj1lqz-duaXd6aK7kkXQvO1M4xJoktmT2Ey-JDf9fygt7AP2saC86KRWK0B3drRkNNkSFeZ9VDhoPPf6KsZ9hG1zVUjUOpZFTED6zDZ0PMw
#確保 Dashboard 服務已正確暴露(如 NodePort 30001),并通過瀏覽器訪問:
[root@master 12:59:31]# kubectl get svc -n kubernetes-dashboard | grep NodePort
kubernetes-dashboard NodePort 10.96.67.20 <none> 443:30001/TCP 136m
九、故障運維
#重啟docker
systemctl restart docker
#重啟kubelet
systemctl restart kubelet
#查看docker 狀態
systemctl status docker
#查看kubelet狀態
systemctl status kubelet
#查看所有pod 狀態
kubectl get pods -A
#查看kubernetes-dashboard信息
kubectl get pods,svc -n kubernetes-dashboard問題1:CoreDNS異常CrashLoopBackOff反復重啟問題
#編輯 Corefile loop #將loop直接刪除,避免內部循環
kubectl edit -n kube-system cm coredns
#修改完CoreDNS后,將coredns的pod重新刪除后就恢復正常
kubectl delete -n kube-system pod coredns-59799fb945-tcjsl
kubectl delete -n kube-system pod coredns-59799fb945-zlqkt問題2:k8s部署calico網絡后,calico-node顯示READY 0/1
#原因是master節點網卡比較多,calico選擇了錯誤的網卡,
#修改calico.yaml,指定正確的網卡名稱即可。
vim calico.yaml
# Auto-detect the BGP IP address.
- name: IP_AUTODETECTION_METHODvalue: "interface=ens160"問題3:節點暴露的端口配置
vim recommended.yaml問題4:Kubernetes Dashboard 不顯示 CPU 和內存數據。
#下載Metrics Server
wget https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
#配置調整添加 TLS 忽略參數(解決證書問題)
vim components.yaml
添加:containers:- args:- --cert-dir=/tmp- --secure-port=10250- --kubelet-insecure-tls #添加此行
#切換到阿里源
sed -i 's|registry.k8s.io/metrics-server/metrics-server|registry.aliyuncs.com/google_containers/metrics-server|g' components.yaml
#部署
kubectl apply -f components.yaml
#檢查 Pod 狀態:
[root@master 13:11:31]# kubectl get pods -A | grep metrics-server
kube-system metrics-server-85c75cb9b4-8nrqh 1/1 Running 0 76s
#查看節點和 Pod 資源使用情況:
kubectl top nodes # 顯示所有節點 CPU/內存數據。
kubectl top pods -A #顯示所有命名空間下 Pod 的資源使用情況(如 CPU 和內存)。
[root@master 13:11:49]# kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
master 211m 7% 1974Mi 54%
node1 129m 4% 2148Mi 59%
node2 103m 3% 2100Mi 58%
[root@master 13:11:55]# kubectl top pods -A
NAMESPACE NAME CPU(cores) MEMORY(bytes)
kube-system calico-kube-controllers-6b44cbc54d-fdxjk 2m 21Mi
kube-system calico-node-7q9pl 38m 186Mi
kube-system calico-node-kwtrj 38m 192Mi
kube-system calico-node-vwskq 28m 199Mi
kube-system coredns-5f98f8d567-5lldq 3m 21Mi
kube-system coredns-5f98f8d567-j5874 2m 20Mi
kube-system etcd-master 29m 70Mi
kube-system kube-apiserver-master 55m 319Mi
kube-system kube-controller-manager-master 15m 98Mi
kube-system kube-proxy-96xr5 1m 22Mi
kube-system kube-proxy-f9wl6 1m 21Mi
kube-system kube-proxy-sqfrh 1m 18Mi
kube-system kube-scheduler-master 4m 35Mi
kube-system metrics-server-85c75cb9b4-8nrqh 4m 16Mi
kubernetes-dashboard dashboard-metrics-scraper-bd84c9d8b-x2gmj 1m 15Mi
kubernetes-dashboard kubernetes-dashboard-5cc694d9b-825gq 1m 23Mi 問題5:Token有效期時間太短。
方法1:
#打印輸出配置
kubectl -n kubernetes-dashboard get deploy kubernetes-dashboard -o yaml
#編輯,添加- --token-ttl=43200 # 新增參數單位:秒,43200秒=12小時)
kubectl -n kubernetes-dashboard edit deploy kubernetes-dashboard
spec:containers:- args:- --auto-generate-certificates- --namespace=kubernetes-dashboard- --token-ttl=43200 # 新增參數單位:秒,43200秒=12小時)
#驗證修改
[root@master 14:01:34]# kubectl -n kubernetes-dashboard get deploy kubernetes-dashboard -o yaml | grep "token-ttl"- --token-ttl=43200
#若修改導致 Dashboard 無法啟動,可通過以下命令回滾:
kubectl -n kubernetes-dashboard rollout undo deploy kubernetes-dashboard方法2:#直接生成長期有效的 Token
kubectl -n kubernetes-dashboard create token dashboard-admin --duration=720h # 有效期 720 小時(30 天)
問題1:
問題2:
問題3:
問題4:
?問題5: