在最近一次 K8s 環境的維護中,發現多個 Pod 使用相同鏡像時,調度到固定節點的問題導致集群節點資源分配不均的情況。 啟用調度器的打分日志后發現這一現象是由 ImageLocality 打分策略所引起的(所有的節點中,只有一個節點有運行該 pod 的鏡像,所以這個節點調度器打分最高); 最終,通過禁用該插件成功解決調度不均勻的問題。在此,我想分享這一經驗,希望能夠對大家有所幫助!
1、自定義配置文件
# vi /etc/kubernetes/config.yaml
# 此處禁用 ImageLocality 打分插件(設置權重為 0)
...
apiVersion: kubescheduler.config.k8s.io/v1
kind: KubeSchedulerConfiguration
clientConnection:kubeconfig: /etc/kubernetes/scheduler.conf
profiles:- schedulerName: default-schedulerplugins:multiPoint:enabled:- name: ImageLocalityweight: 0
...
- (可跳過)禁用打分插件的第二種方式,根據實際情況進行配置
# vi /etc/kubernetes/config.yaml
apiVersion: kubescheduler.config.k8s.io/v1beta1
kind: KubeSchedulerConfiguration
clientConnection:kubeconfig: /etc/kubernetes/scheduler.conf
profiles:- schedulerName: default-schedulerplugins:score:disabled:- name: ImageLocality
2、配置 kube-scheduler
# 修改 kube-scheduler.yaml
# 在 spec.containers[0].command 添加參數;并掛載配置文件
vi /etc/kubernetes/manifests/kube-scheduler.yaml
...
- --config=/etc/kubernetes/config.yaml
...- mountPath: /etc/kubernetes/config.yaml # 添加掛載name: configreadOnly: true
...- hostPath:path: /etc/kubernetes/config.yamltype: FileOrCreatename: config
...# 等待 kube-scheduler 自動重啟
kubectl get pod -n kube-system# 完整 kube-scheduler.yaml 如下
apiVersion: v1
kind: Pod
metadata:creationTimestamp: nulllabels:component: kube-schedulertier: control-planename: kube-schedulernamespace: kube-system
spec:containers:- command:- kube-scheduler- --authentication-kubeconfig=/etc/kubernetes/scheduler.conf- --authorization-kubeconfig=/etc/kubernetes/scheduler.conf- --bind-address=127.0.0.1- --kubeconfig=/etc/kubernetes/scheduler.conf- --leader-elect=true- --config=/etc/kubernetes/config.yaml- --v=10image: registry.cn-hangzhou.aliyuncs.com/google_containers/kube-scheduler:v1.31.0imagePullPolicy: IfNotPresentlivenessProbe:failureThreshold: 8httpGet:host: 127.0.0.1path: /healthzport: 10259scheme: HTTPSinitialDelaySeconds: 10periodSeconds: 10timeoutSeconds: 15name: kube-schedulerresources:requests:cpu: 100mstartupProbe:failureThreshold: 24httpGet:host: 127.0.0.1path: /healthzport: 10259scheme: HTTPSinitialDelaySeconds: 10periodSeconds: 10timeoutSeconds: 15volumeMounts:- mountPath: /etc/kubernetes/scheduler.confname: kubeconfigreadOnly: true- mountPath: /etc/kubernetes/config.yamlname: configreadOnly: truehostNetwork: truepriority: 2000001000priorityClassName: system-node-criticalsecurityContext:seccompProfile:type: RuntimeDefaultvolumes:- hostPath:path: /etc/kubernetes/scheduler.conftype: FileOrCreatename: kubeconfig- hostPath:path: /etc/kubernetes/config.yamltype: FileOrCreatename: config
status: {}
3、驗證自定義配置生效
# kube-scheduler 日志如何開啟可見博客
# https://blog.csdn.net/mm1234556/article/details/148686859# 通過日志查看配置文件
kubectl logs -n kube-system kube-scheduler-master |grep -A50 apiVersion:
# 手動創建一個 pod,查看 kube-scheduler 日志中調度的打分細節
kubectl logs kube-scheduler-master -n kube-system |grep -A10 score# 詳細日志,最終調度到 node04 節點上
I0623 01:28:10.258309 1 resource_allocation.go:78] mysql-server-68468bcd96-j66bj -> node02: NodeResourcesBalancedAllocation, map of allocatable resources map[cpu:28000 memory:32772333568], map of requested resources map[cpu:3200 memory:5674337280] ,score 94,
I0623 01:28:10.258309 1 resource_allocation.go:78] mysql-server-68468bcd96-j66bj -> node01: NodeResourcesBalancedAllocation, map of allocatable resources map[cpu:28000 memory:32772333568], map of requested resources map[cpu:3820 memory:6427770880] ,score 94,
I0623 01:28:10.258312 1 resource_allocation.go:78] mysql-server-68468bcd96-j66bj -> node04: NodeResourcesBalancedAllocation, map of allocatable resources map[cpu:8000 memory:32122888192], map of requested resources map[cpu:1050 memory:1468006400] ,score 91,
I0623 01:28:10.258334 1 resource_allocation.go:78] mysql-server-68468bcd96-j66bj -> node02: NodeResourcesLeastAllocated, map of allocatable resources map[cpu:28000 memory:32772333568], map of requested resources map[cpu:3200 memory:5674337280] ,score 85,
I0623 01:28:10.258339 1 resource_allocation.go:78] mysql-server-68468bcd96-j66bj -> node04: NodeResourcesLeastAllocated, map of allocatable resources map[cpu:8000 memory:32122888192], map of requested resources map[cpu:1050 memory:1468006400] ,score 90,
I0623 01:28:10.258338 1 resource_allocation.go:78] mysql-server-68468bcd96-j66bj -> node01: NodeResourcesLeastAllocated, map of allocatable resources map[cpu:28000 memory:32772333568], map of requested resources map[cpu:3820 memory:6427770880] ,score 83,
I0623 01:28:10.258375 1 generic_scheduler.go:504] Plugin NodePreferAvoidPods scores on test-5/mysql-server-68468bcd96-j66bj => [{node01 1000000} {node04 1000000} {node02 1000000}]
I0623 01:28:10.258384 1 generic_scheduler.go:504] Plugin PodTopologySpread scores on test-5/mysql-server-68468bcd96-j66bj => [{node01 0} {node04 0} {node02 0}]
I0623 01:28:10.258389 1 generic_scheduler.go:504] Plugin TaintToleration scores on test-5/mysql-server-68468bcd96-j66bj => [{node01 100} {node04 100} {node02 100}]
I0623 01:28:10.258393 1 generic_scheduler.go:504] Plugin NodeResourcesBalancedAllocation scores on test-5/mysql-server-68468bcd96-j66bj => [{node01 94} {node04 91} {node02 94}]
I0623 01:28:10.258396 1 generic_scheduler.go:504] Plugin InterPodAffinity scores on test-5/mysql-server-68468bcd96-j66bj => [{node01 0} {node04 0} {node02 0}]
I0623 01:28:10.258400 1 generic_scheduler.go:504] Plugin NodeResourcesLeastAllocated scores on test-5/mysql-server-68468bcd96-j66bj => [{node01 83} {node04 90} {node02 85}]
I0623 01:28:10.258404 1 generic_scheduler.go:504] Plugin NodeAffinity scores on test-5/mysql-server-68468bcd96-j66bj => [{node01 0} {node04 0} {node02 0}]
I0623 01:28:10.258409 1 generic_scheduler.go:560] Host node01 => Score 1000277
I0623 01:28:10.258412 1 generic_scheduler.go:560] Host node04 => Score 1000281
I0623 01:28:10.258414 1 generic_scheduler.go:560] Host node02 => Score 1000279