Kubernetes網絡性能測試-calico插件環境

Kubernetes 網絡性能測試-calico插件環境

本次主要針對calico網絡插件k8s集群的網絡性能進行摸底及測試方法探索實踐。

1. 測試準備

1.1 測試環境

測試環境為VMware Workstation虛擬機搭建的一套K8S環境，版本為1.28.2，網絡插件使用calico，版本為3.28.0。

hostname	ip	配置	備注
master	192.168.0.61	2C4G	master
node1	192.168.0.62	2C4G	worker
node2	192.168.0.63	2C4G	worker

已經部署測試應用sample-webapp，三個副本：

root@master1:~# kubectl get svc
NAME            TYPE           CLUSTER-IP       EXTERNAL-IP     PORT(S)        AGE
kubernetes      ClusterIP      10.96.0.1        <none>          443/TCP        15d
sample-webapp   ClusterIP      10.106.172.112   <none>          8000/TCP       47s
root@master1:~# kubectl get pod
NAME                            READY   STATUS    RESTARTS   AGE
sample-webapp-77745d4db-57rbx   1/1     Running   0          50s
sample-webapp-77745d4db-vprjc   1/1     Running   0          50s
sample-webapp-77745d4db-wltzl   1/1     Running   0          12s

備注：

本文測試環境，master節點取消了NoSchedule的Taints，使得master節點也可以調度業務pod。
測試用到的demo程序可以訪問以下倉庫獲取：https://gitee.com/lldhsds/distributed-load-testing-using-k8s.git，僅部署sample-webapp即可。當然也可以使用其他應用進行測試，比如nginx等。

1.2 測試場景

Kubernetes 集群 node 節點上通過 Cluster IP 方式訪問
Kubernetes 集群內部通過 service 訪問
Kubernetes 集群外部通過 metalLB暴露的地址訪問

1.3 測試工具

curl
測試程序：sample-webapp，源碼見 Github kubernetes 的分布式負載測試。

1.4 測試說明

通過向 sample-webapp 發送 curl 請求獲取響應時間，直接 curl 后的結果為：

root@master1:~# curl "http://10.106.172.112:8000"
Welcome to the "Distributed Load Testing Using Kubernetes" sample web app

2. 網絡延遲測試

2.1 Kubernetes 集群 node 節點上通過 Cluster IP 訪問

測試命令

curl -o /dev/null -s -w '%{time_connect} %{time_starttransfer} %{time_total}' "http://10.106.172.112:8000"

測試10組數據，取平均結果：

[root@k8s-node1 ~]# echo "time_connect  time_starttransfer time_total"; for i in {1..10}; do curl -o /dev/null -s -w '%{time_connect} %{time_starttransfer} %{time_total}\n' "http://10.106.172.112:8000"; done
time_connect  time_starttransfer time_total
0.000362 0.001538 0.001681
0.000456 0.002825 0.003137
0.000744 0.003152 0.003394
0.000388 0.001750 0.001847
0.000617 0.002405 0.003527
0.000656 0.002153 0.002390
0.000204 0.001597 0.001735
0.000568 0.002759 0.003117
0.000826 0.002278 0.002463
0.000582 0.002062 0.002120

平均響應時間：2.541 ms

指標說明：

time_connect：建立到服務器的 TCP 連接所用的時間
time_starttransfer：在發出請求之后，Web 服務器返回數據的第一個字節所用的時間
time_total：完成請求所用的時間

2.2 Kubernetes 集群內部通過 service 訪問

# 進入測試的客戶端
root@master1:~# kubectl exec -it test-78bcb7b45d-t9mvw -- /bin/bash
# 執行測試
test-78bcb7b45d-t9mvw:/# echo "time_connect  time_starttransfer time_total"; for i in {1..10}; do curl -o /dev/null -s -w '%{time_connect} %{time_starttransfer} %{time_total}\n' "http://sample-webapp:8000/"; done
time_connect  time_starttransfer time_total
0.001359 0.004808 0.005001
0.001470 0.003394 0.003508
0.001888 0.003702 0.004018
0.001630 0.003267 0.003622
0.001621 0.003235 0.003491
0.001008 0.002891 0.002951
0.001508 0.003137 0.003313
0.004427 0.006035 0.006293
0.001512 0.002924 0.002962
0.001649 0.003774 0.003981

平均響應時間：約3.914 ms

2.3 在外部通過ingress 訪問

本文使用LB service類型，也可以將service改為NodePort類型進行測試。

root@master1:~# kubectl get svc
NAME            TYPE           CLUSTER-IP       EXTERNAL-IP     PORT(S)          AGE
kubernetes      ClusterIP      10.96.0.1        <none>          443/TCP          15d
sample-webapp   LoadBalancer   10.98.247.190    192.168.0.242   80:31401/TCP     6m16s

在外部的客戶端添加域名解析sample-webapp.test.com 192.168.0.242，進行訪問測試：

$ echo "time_connect  time_starttransfer time_total"; for i in {1..10}; do curl -o /dev/null -s -w '%{time_connect} %{time_starttransfer} %{time_total}\n' "http://sample-webapp.test.com/"; done
time_connect  time_starttransfer time_total
0.015586 0.017507 0.028004
0.039582 0.042034 0.052532
0.006702 0.008856 0.019461
0.041783 0.046729 0.057053
0.005912 0.008222 0.019276
0.006669 0.009513 0.020058
0.006592 0.010141 0.020598
0.011113 0.014444 0.026173
0.006471 0.008854 0.019571
0.041217 0.043651 0.055667

平均響應時間：31.839 ms

2.4 測試結果

在這三種場景下的響應時間測試結果如下：

Kubernetes 集群 node 節點上通過 Cluster IP 方式訪問：2.541 ms
Kubernetes 集群內部通過 service 訪問：3.914 ms
Kubernetes 集群外部通過 traefik ingress 暴露的地址訪問：31.839 ms

說明：

執行測試的 node 節點 / Pod 與 serivce 所在的 pod 的距離（是否在同一臺主機上），對前兩個場景可能會有一定影響。
測試結果僅作參考，與具體的資源配置、網絡環境等因素有關系。

3. 網絡性能測試

網絡使用 flannel 的 vxlan 模式，使用 iperf 進行測試。

服務端命令：

iperf3 -s -p 12345 -i 1

客戶端命令：

iperf3 -c ${server-ip} -p 12345 -i 1 -t 10

3.1 node節點之間

# node1啟動iperf服務端
[root@node1 ~]# iperf3 -s -p 12345 -i 1
-----------------------------------------------------------
Server listening on 12345# node2啟動iperf客戶端測試
root@node2:~# iperf3 -c 192.168.0.62 -p 12345 -i 1 -t 60
Connecting to host 192.168.0.62, port 12345
[  5] local 192.168.0.63 port 40948 connected to 192.168.0.62 port 12345
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   325 MBytes  2.72 Gbits/sec  107   1.53 MBytes
...
[  5]  59.00-60.00  sec   339 MBytes  2.84 Gbits/sec    2   1.29 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-60.00  sec  19.9 GBytes  2.85 Gbits/sec  421             sender
[  5]   0.00-60.00  sec  19.9 GBytes  2.85 Gbits/sec                  receiver

3.2 不同node的 Pod 之間（Native Routing-BGP模式）

查看calico當前的網絡模式，也是calico默認的配置：

root@master1:~# calicoctl node status
Calico process is running.IPv4 BGP status
+--------------+-------------------+-------+----------+-------------+
| PEER ADDRESS |     PEER TYPE     | STATE |  SINCE   |    INFO     |
+--------------+-------------------+-------+----------+-------------+
| 192.168.0.62 | node-to-node mesh | up    | 04:06:45 | Established |
| 192.168.0.63 | node-to-node mesh | up    | 04:06:39 | Established |
+--------------+-------------------+-------+----------+-------------+IPv6 BGP status
No IPv6 peers found.root@master1:~# calicoctl get ipPool default-ipv4-ippool -o wide
NAME                  CIDR            NAT    IPIPMODE   VXLANMODE     DISABLED   DISABLEBGPEXPORT   SELECTOR
default-ipv4-ippool   10.244.0.0/16   true   Never      CrossSubnet   false      false              all()# 查看節點路由表，跨節點pod通信走節點網卡（ens33）直接轉發
root@master1:~# ip route
default via 192.168.0.1 dev ens33 proto static
10.244.44.0/24 via 192.168.0.63 dev ens33 proto 80 onlink
10.244.154.0/24 via 192.168.0.62 dev ens33 proto 80 onlink
...

當前的網絡模式為bgp，VXLANMODE為CrossSubnet，只有在跨子網（node節點）轉發時才進行Vxlan封裝，同子網（node節點）之間進行三層路由轉發。

備注：

上面的配置為calico的默認配置，同子網（node節點）情況下，pod之間通信直接通過node節點的路由表進行轉發。這點和flannel的host-gw模式是類似的。只有在跨子網（node節點）時，才會走vxlan隧道封裝解封裝。

calico 也支持同子網node之間pod通信走vxlan隧道的通信方式，將上面的參數修改，vxlanMode: Always，由于overlay封裝，性能會有損耗，這點和flannel的vxlan模式是類似的。

詳情參見：Overlay networking | Calico Documentation (tigera.io)

測試不同node節點pod之間的網絡性能：

# 部署測試用pod，分布于兩個node節點
[root@k8s-master ~]# cat test-deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:name: test
spec:replicas: 3selector:matchLabels:app: testtemplate:metadata:labels:app: testspec:containers:- name: testimage: lldhsds/alpine:perf-20240717command: ["sleep", "3600"][root@master1 ~]# kubectl get pod -o wide
NAME               READY   STATUS    RESTARTS   AGE   IP             NODE         NOMINATED NODE   READINESS GATES
test-78bcb7b45d-mpfgn   1/1   Running     0     50s   10.244.44.136    node2     <none>           <none>
test-78bcb7b45d-t9mvw   1/1   Running     0     84m   10.244.154.221   node1     <none>           <none>
...# 在node2的pod中啟動iperf服務端
root@master1:~# kubectl exec -it test-78bcb7b45d-mpfgn -- /bin/bash
test-78bcb7b45d-mpfgn:/# iperf3 -s -p 12345 -i 1# 在node1的pod中啟動iperf客戶端測試
root@master1:~# kubectl exec -it test-78bcb7b45d-t9mvw -- /bin/bash
test-78bcb7b45d-t9mvw:/# iperf3 -c 10.244.44.136 -p 12345 -i 1 -t 60
Connecting to host 10.244.44.136, port 12345
[  5] local 10.244.154.221 port 39024 connected to 10.244.44.136 port 12345
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   351 MBytes  2.94 Gbits/sec  1565   1.55 MBytes
...
[  5]  59.00-60.00  sec   347 MBytes  2.91 Gbits/sec   25   1.33 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-60.00  sec  19.2 GBytes  2.75 Gbits/sec  4355             sender
[  5]   0.00-60.01  sec  19.2 GBytes  2.75 Gbits/sec                  receiver

說明：

測試鏡像已經上傳值docker hub，可以通過命令拉取：docker pull lldhsds/alpine:perf-20240717

3.3 Node 與不同node的 Pod 之間（Native Routing-BGP模式）

# node1啟動iperf服務端
[root@node1 ~]# iperf3 -s -p 12345 -i 1
-----------------------------------------------------------
Server listening on 12345# 在node2的pod中啟動iperf客戶端測試
[root@k8s-master ~]# kubectl get pod -o wide
NAME                 READY   STATUS RESTARTS   AGE     IP             NODE         NOMINATED NODE   READINESS GATES
test-78bcb7b45d-mpfgn  1/1   Running    0    40m     10.244.44.136    node2     <none>           <none>
test-78bcb7b45d-t9mvw  1/1   Running    0    124m    10.244.154.221   node1     <none>           <none>
root@master1:~# kubectl exec -it test-78bcb7b45d-mpfgn -- /bin/bash
test-78bcb7b45d-mpfgn:/# iperf3 -c 192.168.0.62 -p 12345 -i 1 -t 60
Connecting to host 192.168.0.62, port 12345
[  5] local 10.244.44.136 port 32770 connected to 192.168.0.62 port 12345
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   328 MBytes  2.75 Gbits/sec  771   1.48 MBytes
...
[  5]  59.00-60.00  sec   326 MBytes  2.73 Gbits/sec  191   1.14 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-60.00  sec  19.4 GBytes  2.78 Gbits/sec  4237             sender
[  5]   0.00-60.00  sec  19.4 GBytes  2.78 Gbits/sec                  receiver

3.4 不同node的 Pod 之間（VXLAN）

修改calico bgp的VXLANMODE為always，當pod跨節點（同子網）通信時，使用vxlan封裝解封裝。

# 查看calico的ipv4地址池
root@master1:~# calicoctl get ippool
NAME                  CIDR            SELECTOR
default-ipv4-ippool   10.244.0.0/16   all()# 到處默認ipv4地址池的配置
root@master1:~# calicoctl  get ippool default-ipv4-ippool -o yaml > ippool.yaml# 修改vxlanMode，默認為CrossSubnet
root@master1:~# vi ippool.yaml	# 
...vxlanMode: Always# 應用修改
root@master1:~# calicoctl apply -f ippool.yaml
Successfully applied 1 'IPPool' resource(s)root@master1:~# calicoctl  get ippool default-ipv4-ippool -o wide
NAME                  CIDR            NAT    IPIPMODE   VXLANMODE   DISABLED   DISABLEBGPEXPORT   SELECTOR
default-ipv4-ippool   10.244.0.0/16   true   Never      Always      false      false              all()# 這個時候查看node節點的路由表，當pod進行跨節點通信時，會走vxlan.calico進行封裝
root@master1:~#  ip route
default via 192.168.0.1 dev ens33 proto static
10.244.44.0/24 via 10.244.44.0 dev vxlan.calico onlink
10.244.154.0/24 via 10.244.154.0 dev vxlan.calico onlink
.。。

進行pod之間的帶寬測試：

# 在node1的pod中啟動iperf服務端
root@master1:~# kubectl exec -it test-78bcb7b45d-mpfgn -- /bin/bash
test-78bcb7b45d-mpfgn:/# iperf3 -s -p 12345 -i 1# 在node2的pod中啟動iperf客戶端測試
test-78bcb7b45d-t9mvw:/# iperf3 -c 10.244.44.136 -p 12345 -i 1 -t 60
Connecting to host 10.244.44.136, port 12345
[  5] local 10.244.154.221 port 59994 connected to 10.244.44.136 port 12345
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   123 MBytes  1.03 Gbits/sec  196    793 KBytes
[  5]   1.00-2.00   sec   120 MBytes  1.00 Gbits/sec   73    896 KBytes
...
[  5]  59.00-60.00  sec   172 MBytes  1.45 Gbits/sec    0   2.30 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-60.00  sec  6.83 GBytes   978 Mbits/sec  1650             sender
[  5]   0.00-60.02  sec  6.83 GBytes   977 Mbits/sec                  receiver

3.5 Node 與不同node的 Pod 之間（VXLAN）

# node1啟動iperf服務端
[root@node1 ~]# iperf3 -s -p 12345 -i 1
-----------------------------------------------------------
Server listening on 12345# 在node2的pod中啟動iperf客戶端測試
[root@k8s-master ~]# kubectl get pod -o wide
NAME                 READY   STATUS RESTARTS   AGE     IP             NODE         NOMINATED NODE   READINESS GATES
test-78bcb7b45d-mpfgn  1/1   Running    0    40m     10.244.44.136    node2     <none>           <none>
test-78bcb7b45d-t9mvw  1/1   Running    0    124m    10.244.154.221   node1     <none>           <none>
root@master1:~# kubectl exec -it test-78bcb7b45d-mpfgn -- /bin/bash
test-78bcb7b45d-mpfgn:/# iperf3 -c 192.168.0.62 -p 12345 -i 1 -t 60
Connecting to host 192.168.0.62, port 12345
[  5] local 10.244.44.136 port 51004 connected to 192.168.0.62 port 12345
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   346 MBytes  2.89 Gbits/sec  992   1.46 MBytes
[  5]   1.00-2.00   sec   350 MBytes  2.94 Gbits/sec  347   1.61 MBytes
...
[  5]  59.00-60.00  sec   314 MBytes  2.63 Gbits/sec    0   1.49 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-60.00  sec  18.6 GBytes  2.67 Gbits/sec  4278             sender
[  5]   0.00-60.01  sec  18.6 GBytes  2.67 Gbits/sec                  receiver

3.6 不同node的Pod之間（IP-IP模式）

修改為IP-IP模式：

root@master1:~# calicoctl  get ippool default-ipv4-ippool -o yaml > ippool.yaml# 修改ipipMode，同時刪除VXLANMode = 'CrossSubnet'
root@master1:~# vi ippool.yaml
...ipipMode: Always
...
# 應用修改
root@master1:~# calicoctl apply -f ippool.yaml
Successfully applied 1 'IPPool' resource(s)# 查看節點路由表此時流量轉發通過tunl0設備出去
root@master1:~# ip route
default via 192.168.0.1 dev ens33 proto static
10.244.44.0/24 via 192.168.0.63 dev tunl0 proto bird onlink
10.244.154.0/24 via 192.168.0.62 dev tunl0 proto bird onlink
blackhole 10.244.161.0/24 proto bird

進行pod之間的帶寬測試：

# 在node1的pod中啟動iperf服務端
root@master1:~# kubectl exec -it test-78bcb7b45d-mpfgn -- /bin/bash
test-78bcb7b45d-mpfgn:/# iperf3 -s -p 12345 -i 1# 在node2的pod中啟動iperf客戶端測試
test-78bcb7b45d-t9mvw:/# iperf3 -c 10.244.44.136 -p 12345 -i 1 -t 60
Connecting to host 10.244.44.136, port 12345
[  5] local 10.244.154.221 port 59994 connected to 10.244.44.136 port 12345
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   163 MBytes  1.37 Gbits/sec  119   1.04 MBytes
[  5]   1.00-2.00   sec   129 MBytes  1.08 Gbits/sec  128   1.12 MBytes
...
[  5]  58.00-59.00  sec   172 MBytes  1.44 Gbits/sec    0   2.92 MBytes
[  5]  59.00-60.00  sec   161 MBytes  1.35 Gbits/sec    0   3.13 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-60.00  sec  8.60 GBytes  1.23 Gbits/sec  1256             sender
[  5]   0.00-60.02  sec  8.60 GBytes  1.23 Gbits/sec                  receiver

3. 7 Node 與不同node的 Pod 之間（IP-IP）

# node1啟動iperf服務端
[root@node1 ~]# iperf3 -s -p 12345 -i 1
-----------------------------------------------------------
Server listening on 12345# 在node2的pod中啟動iperf客戶端測試
[root@k8s-master ~]# kubectl get pod -o wide
NAME                 READY   STATUS RESTARTS   AGE     IP             NODE         NOMINATED NODE   READINESS GATES
test-78bcb7b45d-mpfgn  1/1   Running    0    40m     10.244.44.136    node2     <none>           <none>
test-78bcb7b45d-t9mvw  1/1   Running    0    124m    10.244.154.221   node1     <none>           <none>
root@master1:~# kubectl exec -it test-78bcb7b45d-mpfgn -- /bin/bash
test-78bcb7b45d-mpfgn:/# iperf3 -c 192.168.0.62 -p 12345 -i 1 -t 60
Connecting to host 192.168.0.62, port 12345
[  5] local 10.244.44.136 port 51004 connected to 192.168.0.62 port 12345
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   407 MBytes  3.41 Gbits/sec  506    692 KBytes
[  5]   1.00-2.00   sec   336 MBytes  2.82 Gbits/sec  577    949 KBytes
[  5]   2.00-3.00   sec   310 MBytes  2.60 Gbits/sec   92   1.13 MBytes
...
[  5]  58.00-59.00  sec   323 MBytes  2.71 Gbits/sec    3   1.28 MBytes
[  5]  59.00-60.00  sec   329 MBytes  2.76 Gbits/sec    0   1.44 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-60.00  sec  19.4 GBytes  2.78 Gbits/sec  3960             sender
[  5]   0.00-60.00  sec  19.4 GBytes  2.77 Gbits/sec                  receiver

4. 總結

4.1 帶寬數據對比

場景	calio網絡模式	帶寬（Gbits/sec）
不同node之間	不涉及	2.85
不同node的pod之間	Native Routing (BGP) 模式	2.75
node與不同node的pod之間	Native Routing (BGP) 模式	2.78
不同node的pod之間	VXLAN	0.98
node與不同node的pod之間	VXLAN	2.67
不同node的pod之間	IP-IP	1.23
node與不同node的pod之間	IP-IP	2.77

從上述數據得出結論：

calico的BGP模式網絡性能（pod之間）相比宿主機直接互聯的損耗在5%左右，基本接近與node節點之間的網絡性能，考慮浮動和誤差，基本可以認為網絡沒有損耗。從轉發路徑分析，相比于節點直接轉發，pod轉發只是多了連接pod和節點的veth-pair。
VXLAN模式和IP-IP模式下，pod之間性能損耗高于50%，性能數據較差，這兩種模式下報文都經過了封裝、解封裝。相對來說，IP-IP模式下的性能要好與VXLAN模式，從底層報文封裝角度講，ip-ip要比vxlan開銷小，這樣也能解釋的通。
Node與不同node的Pod之間測試來看，性能損耗比較少。

說明：

測試過程中也出現過pod之間帶寬高于node節點之間的情況，數據有一定的波動。本文使用的是筆記本下創建的虛擬機測試環境，虛擬機資源存在爭搶，對于測試結果有一定的影響。
VXLAN、IP-IP模式下損耗大于50%，不代表一般的經驗數據，可能與具體的環境有關系，此處僅作參考。

關于calico網絡相關的進一步說明請參考：Overlay networking | Calico Documentation (tigera.io)。

4.2 綜合對比

本文使用的測試環境節點都在同子網下，只對比了有封裝和直接轉發兩種場景的性能數據。

關于跨子網的場景有待進一步摸索，下面是一些匯總的說明：

模式	網絡模式	描述	性能	適用場景
BGP	underlay	使用 BGP 在節點間傳播路由，不進行任何封裝	損耗最低	物理網絡支持 BGP，節點可直接通信。首選該模式。
IPIP-CrossSubnet	underlay+overlay	節點跨子網使用封裝，同子網內直接通信	同子網內基本無損耗，跨子網損耗較多	物理網絡不支持 BGP，節點大多在同子網下
IPIP-Always	overaly	不管節點是否跨子網，都使用 IPIP 封裝進行跨節點通信	損耗較多，優于vxlan	物理網絡不支持 BGP，節點跨多個子網的集群
VXLAN-CrossSubnet	underlay+overaly	節點跨子網使用封裝，同子網內直接通信	同子網內基本無損耗，跨子網損耗較多	物理網絡不支持 BGP，節點大多在同子網下
VXLAN-Always	overaly	不管節點是否跨子網，都使用 VXLAN 封裝進行跨節點通信	損耗較多	物理網絡不支持 BGP，節點跨多個子網的集群