在前文《Linux路由三大件》中,我們提到了 iptables 可以修改數據包的特征從而影響其路由。這個功能無論是傳統場景下的 防火墻,還是云原生場景下的 服務路由(k8s service)、網絡策略(calico network policy) 等都有依賴。
雖然業界在積極地推進 ipvs、ebpf 等更新、更高效的技術落地,但是出于各種各樣的原因(技術、成本、管理等),iptables 仍然有著非常廣泛的使用 ,所以我們有必要了解 iptables 的方方面面。今天我們就來了解其中的重要一環:調試和追蹤 iptables。更具體一點:一個數據包是怎么在 iptables 的各個 chain/table/rule 中游走的。
trace
我們知道,iptables 的 rule 最后是一個 target,這個 target 可以設置為 TRACE。這樣,匹配這個 rule 的包經過的所有 chain/table/rule 都會被記錄下來。
這個 rule 本身只能被放在?raw
?表里,可以存在于?PREROUTING
?和?OUTPUT
?兩個 chain 中。其匹配規則和其他一般的 rule 沒有什么差別。
實踐
我們采用一個 docker 場景來進行實踐,如下圖:
亦即宿主機的 10000 端口映射給了容器的 80 端口。圖中有兩種測試場景,分別是:
- 本節點訪問容器 IP
- 跨節點訪問宿主機 IP
為了讓分析更直觀/簡單,我們用 telnet 測試即可(curl 會產生很多后續 http 包,不夠簡潔)。
OK,開始我們的測試!
首先在宿主機上開啟 iptables log
# check first
# cat /proc/net/netfilter/nf_logmodprobe nf_log_ipv4
sysctl net.netfilter.nf_log.2=nf_log_ipv4
然后增加 trace 配置
# 訪問容器 IP 使用
iptables -t raw -A OUTPUT -p tcp --destination 172.19.0.0/16 --dport 80 -j TRACE# 訪問節點 IP 使用
iptables -t raw -A PREROUTING -p tcp --destination 192.168.64.4 --dport 10000 -j TRACE
查看結果可以用這個命令:cat /var/log/messages | grep "TRACE:"
本節點訪問:telnet 172.19.0.9 80
,結果:
fedora kernel: [ 2350.411619] TRACE: raw:OUTPUT:policy:2 IN= OUT=br-00ea7870520a SRC=172.19.0.1 DST=172.19.0.9 LEN=60 TOS=0x10 PREC=0x00 TTL=64 ID=11432 DF PROTO=TCP SPT=43256 DPT=80 SEQ=2316959420 ACK=0 WINDOW=64240 RES=0x00 SYN URGP=0 OPT (020405B40402080AAF8EE4320000000001030307) UID=0 GID=0
fedora kernel: [ 2350.411634] TRACE: nat:OUTPUT:policy:2 IN= OUT=br-00ea7870520a SRC=172.19.0.1 DST=172.19.0.9 LEN=60 TOS=0x10 PREC=0x00 TTL=64 ID=11432 DF PROTO=TCP SPT=43256 DPT=80 SEQ=2316959420 ACK=0 WINDOW=64240 RES=0x00 SYN URGP=0 OPT (020405B40402080AAF8EE4320000000001030307) UID=0 GID=0
fedora kernel: [ 2350.411639] TRACE: filter:OUTPUT:policy:1 IN= OUT=br-00ea7870520a SRC=172.19.0.1 DST=172.19.0.9 LEN=60 TOS=0x10 PREC=0x00 TTL=64 ID=11432 DF PROTO=TCP SPT=43256 DPT=80 SEQ=2316959420 ACK=0 WINDOW=64240 RES=0x00 SYN URGP=0 OPT (020405B40402080AAF8EE4320000000001030307) UID=0 GID=0
fedora kernel: [ 2350.411644] TRACE: nat:POSTROUTING:policy:4 IN= OUT=br-00ea7870520a SRC=172.19.0.1 DST=172.19.0.9 LEN=60 TOS=0x10 PREC=0x00 TTL=64 ID=11432 DF PROTO=TCP SPT=43256 DPT=80 SEQ=2316959420 ACK=0 WINDOW=64240 RES=0x00 SYN URGP=0 OPT (020405B40402080AAF8EE4320000000001030307) UID=0 GID=0
fedora kernel: [ 2350.411778] TRACE: raw:OUTPUT:policy:2 IN= OUT=br-00ea7870520a SRC=172.19.0.1 DST=172.19.0.9 LEN=52 TOS=0x10 PREC=0x00 TTL=64 ID=11433 DF PROTO=TCP SPT=43256 DPT=80 SEQ=2316959421 ACK=357062108 WINDOW=502 RES=0x00 ACK URGP=0 OPT (0101080AAF8EE4320C2ACEC2) UID=0 GID=0
fedora kernel: [ 2350.411784] TRACE: filter:OUTPUT:policy:1 IN= OUT=br-00ea7870520a SRC=172.19.0.1 DST=172.19.0.9 LEN=52 TOS=0x10 PREC=0x00 TTL=64 ID=11433 DF PROTO=TCP SPT=43256 DPT=80 SEQ=2316959421 ACK=357062108 WINDOW=502 RES=0x00 ACK URGP=0 OPT (0101080AAF8EE4320C2ACEC2) UID=0 GID=0
跨節點訪問:telnet 192.168.64.4 10000
,結果:
fedora kernel: [ 2631.978281] TRACE: raw:PREROUTING:policy:2 IN=enp0s1 OUT= MAC=fa:f3:dc:b4:4e:ef:3a:f9:d3:9e:aa:64:08:00 SRC=192.168.64.1 DST=192.168.64.4 LEN=64 TOS=0x10 PREC=0x00 TTL=64 ID=0 PROTO=TCP SPT=52842 DPT=10000 SEQ=1184533180 ACK=0 WINDOW=65535 RES=0x00 CWR ECE SYN URGP=0 OPT (020405B4010303060101080A6EF8750A0000000004020000)
fedora kernel: [ 2631.978585] TRACE: nat:PREROUTING:rule:1 IN=enp0s1 OUT= MAC=fa:f3:dc:b4:4e:ef:3a:f9:d3:9e:aa:64:08:00 SRC=192.168.64.1 DST=192.168.64.4 LEN=64 TOS=0x10 PREC=0x00 TTL=64 ID=0 PROTO=TCP SPT=52842 DPT=10000 SEQ=1184533180 ACK=0 WINDOW=65535 RES=0x00 CWR ECE SYN URGP=0 OPT (020405B4010303060101080A6EF8750A0000000004020000)
fedora kernel: [ 2631.979055] TRACE: nat:DOCKER:rule:3 IN=enp0s1 OUT= MAC=fa:f3:dc:b4:4e:ef:3a:f9:d3:9e:aa:64:08:00 SRC=192.168.64.1 DST=192.168.64.4 LEN=64 TOS=0x10 PREC=0x00 TTL=64 ID=0 PROTO=TCP SPT=52842 DPT=10000 SEQ=1184533180 ACK=0 WINDOW=65535 RES=0x00 CWR ECE SYN URGP=0 OPT (020405B4010303060101080A6EF8750A0000000004020000)
fedora kernel: [ 2631.979077] TRACE: filter:FORWARD:rule:1 IN=enp0s1 OUT=br-00ea7870520a MAC=fa:f3:dc:b4:4e:ef:3a:f9:d3:9e:aa:64:08:00 SRC=192.168.64.1 DST=172.19.0.9 LEN=64 TOS=0x10 PREC=0x00 TTL=63 ID=0 PROTO=TCP SPT=52842 DPT=80 SEQ=1184533180 ACK=0 WINDOW=65535 RES=0x00 CWR ECE SYN URGP=0 OPT (020405B4010303060101080A6EF8750A0000000004020000)
fedora kernel: [ 2631.979585] TRACE: filter:DOCKER-USER:return:1 IN=enp0s1 OUT=br-00ea7870520a MAC=fa:f3:dc:b4:4e:ef:3a:f9:d3:9e:aa:64:08:00 SRC=192.168.64.1 DST=172.19.0.9 LEN=64 TOS=0x10 PREC=0x00 TTL=63 ID=0 PROTO=TCP SPT=52842 DPT=80 SEQ=1184533180 ACK=0 WINDOW=65535 RES=0x00 CWR ECE SYN URGP=0 OPT (020405B4010303060101080A6EF8750A0000000004020000)
fedora kernel: [ 2631.979585] TRACE: filter:FORWARD:rule:2 IN=enp0s1 OUT=br-00ea7870520a MAC=fa:f3:dc:b4:4e:ef:3a:f9:d3:9e:aa:64:08:00 SRC=192.168.64.1 DST=172.19.0.9 LEN=64 TOS=0x10 PREC=0x00 TTL=63 ID=0 PROTO=TCP SPT=52842 DPT=80 SEQ=1184533180 ACK=0 WINDOW=65535 RES=0x00 CWR ECE SYN URGP=0 OPT (020405B4010303060101080A6EF8750A0000000004020000)
fedora kernel: [ 2631.979585] TRACE: filter:DOCKER-ISOLATION-STAGE-1:return:3 IN=enp0s1 OUT=br-00ea7870520a MAC=fa:f3:dc:b4:4e:ef:3a:f9:d3:9e:aa:64:08:00 SRC=192.168.64.1 DST=172.19.0.9 LEN=64 TOS=0x10 PREC=0x00 TTL=63 ID=0 PROTO=TCP SPT=52842 DPT=80 SEQ=1184533180 ACK=0 WINDOW=65535 RES=0x00 CWR ECE SYN URGP=0 OPT (020405B4010303060101080A6EF8750A0000000004020000)
fedora kernel: [ 2631.980070] TRACE: filter:FORWARD:rule:8 IN=enp0s1 OUT=br-00ea7870520a MAC=fa:f3:dc:b4:4e:ef:3a:f9:d3:9e:aa:64:08:00 SRC=192.168.64.1 DST=172.19.0.9 LEN=64 TOS=0x10 PREC=0x00 TTL=63 ID=0 PROTO=TCP SPT=52842 DPT=80 SEQ=1184533180 ACK=0 WINDOW=65535 RES=0x00 CWR ECE SYN URGP=0 OPT (020405B4010303060101080A6EF8750A0000000004020000)
fedora kernel: [ 2631.980070] TRACE: filter:DOCKER:rule:1 IN=enp0s1 OUT=br-00ea7870520a MAC=fa:f3:dc:b4:4e:ef:3a:f9:d3:9e:aa:64:08:00 SRC=192.168.64.1 DST=172.19.0.9 LEN=64 TOS=0x10 PREC=0x00 TTL=63 ID=0 PROTO=TCP SPT=52842 DPT=80 SEQ=1184533180 ACK=0 WINDOW=65535 RES=0x00 CWR ECE SYN URGP=0 OPT (020405B4010303060101080A6EF8750A0000000004020000)
fedora kernel: [ 2631.980070] TRACE: nat:POSTROUTING:policy:4 IN=enp0s1 OUT=br-00ea7870520a MAC=fa:f3:dc:b4:4e:ef:3a:f9:d3:9e:aa:64:08:00 SRC=192.168.64.1 DST=172.19.0.9 LEN=64 TOS=0x10 PREC=0x00 TTL=63 ID=0 PROTO=TCP SPT=52842 DPT=80 SEQ=1184533180 ACK=0 WINDOW=65535 RES=0x00 CWR ECE SYN URGP=0 OPT (020405B4010303060101080A6EF8750A0000000004020000)
fedora kernel: [ 2631.981288] TRACE: raw:PREROUTING:policy:2 IN=enp0s1 OUT= MAC=fa:f3:dc:b4:4e:ef:3a:f9:d3:9e:aa:64:08:00 SRC=192.168.64.1 DST=192.168.64.4 LEN=52 TOS=0x10 PREC=0x00 TTL=64 ID=0 PROTO=TCP SPT=52842 DPT=10000 SEQ=1184533181 ACK=3245308350 WINDOW=2058 RES=0x00 ACK URGP=0 OPT (0101080A6EF8750D8E0CA28F)
fedora kernel: [ 2631.981299] TRACE: filter:FORWARD:rule:1 IN=enp0s1 OUT=br-00ea7870520a MAC=fa:f3:dc:b4:4e:ef:3a:f9:d3:9e:aa:64:08:00 SRC=192.168.64.1 DST=172.19.0.9 LEN=52 TOS=0x10 PREC=0x00 TTL=63 ID=0 PROTO=TCP SPT=52842 DPT=80 SEQ=1184533181 ACK=3245308350 WINDOW=2058 RES=0x00 ACK URGP=0 OPT (0101080A6EF8750D8E0CA28F)
fedora kernel: [ 2631.981304] TRACE: filter:DOCKER-USER:return:1 IN=enp0s1 OUT=br-00ea7870520a MAC=fa:f3:dc:b4:4e:ef:3a:f9:d3:9e:aa:64:08:00 SRC=192.168.64.1 DST=172.19.0.9 LEN=52 TOS=0x10 PREC=0x00 TTL=63 ID=0 PROTO=TCP SPT=52842 DPT=80 SEQ=1184533181 ACK=3245308350 WINDOW=2058 RES=0x00 ACK URGP=0 OPT (0101080A6EF8750D8E0CA28F)
fedora kernel: [ 2631.981308] TRACE: filter:FORWARD:rule:2 IN=enp0s1 OUT=br-00ea7870520a MAC=fa:f3:dc:b4:4e:ef:3a:f9:d3:9e:aa:64:08:00 SRC=192.168.64.1 DST=172.19.0.9 LEN=52 TOS=0x10 PREC=0x00 TTL=63 ID=0 PROTO=TCP SPT=52842 DPT=80 SEQ=1184533181 ACK=3245308350 WINDOW=2058 RES=0x00 ACK URGP=0 OPT (0101080A6EF8750D8E0CA28F)
fedora kernel: [ 2631.981312] TRACE: filter:DOCKER-ISOLATION-STAGE-1:return:3 IN=enp0s1 OUT=br-00ea7870520a MAC=fa:f3:dc:b4:4e:ef:3a:f9:d3:9e:aa:64:08:00 SRC=192.168.64.1 DST=172.19.0.9 LEN=52 TOS=0x10 PREC=0x00 TTL=63 ID=0 PROTO=TCP SPT=52842 DPT=80 SEQ=1184533181 ACK=3245308350 WINDOW=2058 RES=0x00 ACK URGP=0 OPT (0101080A6EF8750D8E0CA28F)
fedora kernel: [ 2631.981316] TRACE: filter:FORWARD:rule:7 IN=enp0s1 OUT=br-00ea7870520a MAC=fa:f3:dc:b4:4e:ef:3a:f9:d3:9e:aa:64:08:00 SRC=192.168.64.1 DST=172.19.0.9 LEN=52 TOS=0x10 PREC=0x00 TTL=63 ID=0 PROTO=TCP SPT=52842 DPT=80 SEQ=1184533181 ACK=3245308350 WINDOW=2058 RES=0x00 ACK URGP=0 OPT (0101080A6EF8750D8E0CA28F)
對比這張圖
以及宿主機節點上具體的規則(刪除了 docker0 相關)
*raw
:PREROUTING ACCEPT [312:33617]
:OUTPUT ACCEPT [335:24610]
-A PREROUTING -d 192.168.64.4/32 -p tcp -m tcp --dport 10000 -j TRACE
-A OUTPUT -d 172.19.0.0/16 -p tcp -m tcp --dport 80 -j TRACE
COMMIT*nat
:PREROUTING ACCEPT [4:903]
:INPUT ACCEPT [4:903]
:OUTPUT ACCEPT [222:16027]
:POSTROUTING ACCEPT [224:16155]
:DOCKER - [0:0]
-A PREROUTING -m addrtype --dst-type LOCAL -j DOCKER
-A OUTPUT ! -d 127.0.0.0/8 -m addrtype --dst-type LOCAL -j DOCKER
-A POSTROUTING -s 172.19.0.0/16 ! -o br-00ea7870520a -j MASQUERADE
-A POSTROUTING -s 172.19.0.9/32 -d 172.19.0.9/32 -p tcp -m tcp --dport 80 -j MASQUERADE
-A DOCKER -i br-00ea7870520a -j RETURN
-A DOCKER ! -i br-00ea7870520a -p tcp -m tcp --dport 10000 -j DNAT --to-destination 172.19.0.9:80
COMMIT*filter
:INPUT ACCEPT [308:34040]
:FORWARD DROP [0:0]
:OUTPUT ACCEPT [351:26291]
:DOCKER - [0:0]
:DOCKER-ISOLATION-STAGE-1 - [0:0]
:DOCKER-ISOLATION-STAGE-2 - [0:0]
:DOCKER-USER - [0:0]
-A FORWARD -j DOCKER-USER
-A FORWARD -j DOCKER-ISOLATION-STAGE-1
-A FORWARD -o br-00ea7870520a -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -o br-00ea7870520a -j DOCKER
-A FORWARD -i br-00ea7870520a ! -o br-00ea7870520a -j ACCEPT
-A FORWARD -i br-00ea7870520a -o br-00ea7870520a -j ACCEPT
-A DOCKER -d 172.19.0.9/32 ! -i br-00ea7870520a -o br-00ea7870520a -p tcp -m tcp --dport 80 -j ACCEPT
-A DOCKER-ISOLATION-STAGE-1 -i br-00ea7870520a ! -o br-00ea7870520a -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -j RETURN
-A DOCKER-ISOLATION-STAGE-2 -o br-00ea7870520a -j DROP
-A DOCKER-ISOLATION-STAGE-2 -j RETURN
-A DOCKER-USER -j RETURN
COMMIT
分析內容可以知:
- 兩個測試都是 TCP 三步握手的第一、三步的兩個包,從包的 SEQ 字段可以驗證,也可以用 iptables rule 的 pkts 字段的差異來進行驗證
iptables -v -t raw -L PREROUTING --line-number
Chain PREROUTING (policy ACCEPT 239 packets, 24919 bytes)
num pkts bytes target prot opt in out source destination
1 8 445 TRACE tcp -- any any anywhere fedora tcp dpt:ndmp
- 對于 TCP 鏈接,只有第一個包才會經過 nat,后續包屬于?
RELATED,ESTABLISHED
,直接放過,不需要重新 nat - 對于跨節點訪問,第一個包首先來到?
PREROUTING
?鏈, 命中了 docker 添加的?DNAT
?規則:nat:DOCKER
(rule 3
),效果就是第三行到第四行?DST
?從?192.168.64.4
?變成了?172.19.0.9
iptables -v -t nat -L DOCKER --line-number
Chain DOCKER (2 references)
num pkts bytes target prot opt in out source destination
1 0 0 RETURN all -- docker0 any anywhere anywhere
2 0 0 RETURN all -- br-00ea7870520a any anywhere anywhere
3 2 128 DNAT tcp -- !br-00ea7870520a any anywhere anywhere tcp dpt:ndmp to:172.19.0.9:80
DNAT
?后走到了?Forward
?鏈,?DOCKER-USER
?和?DOCKER-ISOLATION-STAGE-1
?都?return
?了。最后通過?filter:FORWARD:rule:8
?來到?filter:DOCKER
?上,accept
?將包送到了?docker bridge: br-00ea7870520a
?上,從而送進了容器 172.19.0.9。當然,第二個包依然屬于?RELATED,ESTABLISHED
,所以在?filter:FORWARD:rule:7
?上直接?ACCEPT
,不需要再走?filter:DOCKER
?了。
?資料直通車:Linux內核源碼技術學習路線+視頻教程內核源碼
學習直通車:Linux內核源碼內存調優文件系統進程管理設備驅動/網絡協議棧