前言
來自于測試中無意發現到的一個接收窗口滿的案例,特殊,或者可以說我以前都沒在實際場景中見過。一開始都沒整太明白,花了些精力才算是弄清楚了些,記錄分享下。
問題說明
在研究擁塞控制的慢啟動階段時,通過 packetdrill 寫了一個腳本,如下。
# cat tcp_troubleshooting_1_001.pkt
`ethtool -K tun0 tso offethtool -K tun0 gso off`0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0 bind(3, ..., ...) = 0
+0 listen(3, 1) = 0+0 < S 0:0(0) win 10000 <mss 1000,nop,nop,sackOK>
+0 > S. 0:0(0) ack 1 <...>
+0.01 < . 1:1(0) ack 1 win 10000
+0 accept(3, ..., ...) = 4+0.01 < P. 1:101(100) ack 1 win 2000
+0 > . 1:1(0) ack 101+0.01 write(4,...,4000) = 4000+0 `sleep 10`
#
TCP 三次握手中通告的 MSS 為 1000,也沒有啟用 WScale 因子,之后客戶端發送了一個數據段,其中通告了自身的 Win 為 2000,服務器也正常響應了 ACK,之后服務器端寫入了 4000 字節大小的數據包。
想象中的實驗結果應該是,由于接收端 Win 2000 限制的緣故,服務器端只能發出 2 個 1000 字節的數據包,可惜想象很美好,結果卻不一致,服務器發出了 4 個 1000 字節的數據包,無視接收窗口的限制,問題在哪?
# tcpdump -i any -nn port 8080
tcpdump: data link type LINUX_SLL2
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
22:00:20.517879 tun0 In IP 192.0.2.1.45917 > 192.168.235.30.8080: Flags [S], seq 0, win 10000, options [mss 1000,nop,nop,sackOK], length 0
22:00:20.517912 tun0 Out IP 192.168.235.30.8080 > 192.0.2.1.45917: Flags [S.], seq 2020381752, ack 1, win 64240, options [mss 1460,nop,nop,sackOK], length 0
22:00:20.528033 tun0 In IP 192.0.2.1.45917 > 192.168.235.30.8080: Flags [.], ack 1, win 10000, length 0
22:00:20.538114 tun0 In IP 192.0.2.1.45917 > 192.168.235.30.8080: Flags [P.], seq 1:101, ack 1, win 2000, length 100: HTTP
22:00:20.538139 tun0 Out IP 192.168.235.30.8080 > 192.0.2.1.45917: Flags [.], ack 101, win 64140, length 0
22:00:20.548217 tun0 Out IP 192.168.235.30.8080 > 192.0.2.1.45917: Flags [.], seq 1:1001, ack 101, win 64140, length 1000: HTTP
22:00:20.548221 tun0 Out IP 192.168.235.30.8080 > 192.0.2.1.45917: Flags [P.], seq 1001:2001, ack 101, win 64140, length 1000: HTTP
22:00:20.548226 tun0 Out IP 192.168.235.30.8080 > 192.0.2.1.45917: Flags [.], seq 2001:3001, ack 101, win 64140, length 1000: HTTP
22:00:20.548227 tun0 Out IP 192.168.235.30.8080 > 192.0.2.1.45917: Flags [P.], seq 3001:4001, ack 101, win 64140, length 1000: HTTP
#
問題分析
通常所說的發送窗口就是min(cwnd,rwnd),由于初始 CWND 為 10,所以初步的判斷仍是 rwnd 的限制,但是為什么 Win 2000 沒生效,確實讓人費解。
第一步驗證 rwnd ,考慮到 No.3 和 No4 均有 Win 值,如果 Win 2000 未生效的情況,是否是受限制于上一個 Win 10000。修改腳本如下,考慮測試簡便,調整成了 3000 和 2000,仍然嘗試寫入 4000 字節大小的數據包。
# cat tcp_troubleshooting_1_002.pkt
`ethtool -K tun0 tso offethtool -K tun0 gso off`0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0 bind(3, ..., ...) = 0
+0 listen(3, 1) = 0+0 < S 0:0(0) win 10000 <mss 1000,nop,nop,sackOK>
+0 > S. 0:0(0) ack 1 <...>
+0.01 < . 1:1(0) ack 1 win 3000
+0 accept(3, ..., ...) = 4+0.01 < P. 1:101(100) ack 1 win 2000
+0 > . 1:1(0) ack 101+0.01 write(4,...,4000) = 4000+0 `sleep 10`
#
可以看到服務器端發送的數據包只有 3 個,也就是受限于上一個 Win 3000,而不是最近的一個 Win 2000。
# tcpdump -i any -nn port 8080
tcpdump: data link type LINUX_SLL2
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
22:30:05.357883 tun0 In IP 192.0.2.1.40269 > 192.168.166.85.8080: Flags [S], seq 0, win 10000, options [mss 1000,nop,nop,sackOK], length 0
22:30:05.357911 tun0 Out IP 192.168.166.85.8080 > 192.0.2.1.40269: Flags [S.], seq 2967841851, ack 1, win 64240, options [mss 1460,nop,nop,sackOK], length 0
22:30:05.367982 tun0 In IP 192.0.2.1.40269 > 192.168.166.85.8080: Flags [.], ack 1, win 3000, length 0
22:30:05.378045 tun0 In IP 192.0.2.1.40269 > 192.168.166.85.8080: Flags [P.], seq 1:101, ack 1, win 2000, length 100: HTTP
22:30:05.378065 tun0 Out IP 192.168.166.85.8080 > 192.0.2.1.40269: Flags [.], ack 101, win 64140, length 0
22:30:05.388145 tun0 Out IP 192.168.166.85.8080 > 192.0.2.1.40269: Flags [.], seq 1:1001, ack 101, win 64140, length 1000: HTTP
22:30:05.388155 tun0 Out IP 192.168.166.85.8080 > 192.0.2.1.40269: Flags [P.], seq 1001:2001, ack 101, win 64140, length 1000: HTTP
22:30:05.388158 tun0 Out IP 192.168.166.85.8080 > 192.0.2.1.40269: Flags [.], seq 2001:3001, ack 101, win 64140, length 1000: HTTP
#
那么為什么服務器端在收到 No.4 也就是 Win 2000 的數據包后,沒有更新 rwnd ?帶著疑惑,翻了源碼和很多資料,最后才找到答案,相關發送窗口更新的處理函數為 tcp_ack_update_window -> tcp_may_update_window ,如下。
/* Check that window update is acceptable.* The function assumes that snd_una<=ack<=snd_next.*/
static inline bool tcp_may_update_window(const struct tcp_sock *tp,const u32 ack, const u32 ack_seq,const u32 nwin)
{return after(ack, tp->snd_una) ||after(ack_seq, tp->snd_wl1) ||(ack_seq == tp->snd_wl1 && nwin > tp->snd_wnd);
}
主要有三個判斷條件,after(ack, tp->snd_una) || after(ack_seq, tp->snd_wl1) || (ack_seq == tp->snd_wl1 && nwin > tp->snd_wnd) 。
- after(ack, tp->snd_una),由于 No.4 的 ACK 并沒有確認新數據,所以判斷不成立;
- after(ack_seq, tp->snd_wl1),由于 No.4 的 Seq Num 和之前 snd_wl1 一樣,所以判斷也不成立;
- (ack_seq == tp->snd_wl1 && nwin > tp->snd_wnd) ,由于 No.4 的 Seq Num 和之前 snd_wl1 一樣,前一個成立,但是 nwin 為 2000 并不大于 snd_wnd 10000,后一個不成立,所以判斷也不成立。
綜合 return 為 0 ,也就是說并不會更新發送窗口為 2000,仍保持發送窗口為 10000,因此服務器最終是可以發出 4 個 1000 字節的數據包。
答案是找到了,但既然到這一步了,就繼續通過實驗驗證,鞏固下這個知識點。
實驗擴展
通過三個實驗驗證以上三個判斷條件,在各自條件成立的情況下,從而更新發送窗口。
- after(ack, tp->snd_una)
修改腳本,使得 ACK 確認新數據,如下,ACK win 2000 的 ack num 1001 確認了新數據。
# cat tcp_troubleshooting_1_003.pkt
`ethtool -K tun0 tso offethtool -K tun0 gso off`0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0 bind(3, ..., ...) = 0
+0 listen(3, 1) = 0+0 < S 0:0(0) win 10000 <mss 1000,nop,nop,sackOK>
+0 > S. 0:0(0) ack 1 <...>
+0.01 < . 1:1(0) ack 1 win 10000
+0 accept(3, ..., ...) = 4+0.01 < P. 1:101(100) ack 1 win 3000
+0 > . 1:1(0) ack 101+0.01 write(4,...,1000) = 1000
+0.01 < . 101:101(0) ack 1001 win 2000+0.01 write(4,...,4000) = 4000
#
Win 2000 限制生效,服務器端僅能發出 2 個 MSS 1000 大小的數據包。
# tcpdump -i any -nn port 8080
tcpdump: data link type LINUX_SLL2
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
22:55:16.017895 ? In IP 192.0.2.1.36917 > 192.168.24.159.8080: Flags [S], seq 0, win 10000, options [mss 1000,nop,nop,sackOK], length 0
22:55:16.017930 ? Out IP 192.168.24.159.8080 > 192.0.2.1.36917: Flags [S.], seq 4123551004, ack 1, win 64240, options [mss 1460,nop,nop,sackOK], length 0
22:55:16.028014 ? In IP 192.0.2.1.36917 > 192.168.24.159.8080: Flags [.], ack 1, win 10000, length 0
22:55:16.038118 ? In IP 192.0.2.1.36917 > 192.168.24.159.8080: Flags [P.], seq 1:101, ack 1, win 3000, length 100: HTTP
22:55:16.038154 ? Out IP 192.168.24.159.8080 > 192.0.2.1.36917: Flags [.], ack 101, win 64140, length 0
22:55:16.048290 ? Out IP 192.168.24.159.8080 > 192.0.2.1.36917: Flags [P.], seq 1:1001, ack 101, win 64140, length 1000: HTTP
22:55:16.058342 ? In IP 192.0.2.1.36917 > 192.168.24.159.8080: Flags [.], ack 1001, win 2000, length 0
22:55:16.068472 ? Out IP 192.168.24.159.8080 > 192.0.2.1.36917: Flags [.], seq 1001:2001, ack 101, win 64140, length 1000: HTTP
22:55:16.068479 ? Out IP 192.168.24.159.8080 > 192.0.2.1.36917: Flags [P.], seq 2001:3001, ack 101, win 64140, length 1000: HTTP
#
- after(ack_seq, tp->snd_wl1)
修改腳本,使得 ACK Seq num 101 更新,大于之前的 snd_wl1 1 。
# cat tcp_troubleshooting_1_004.pkt
`ethtool -K tun0 tso offethtool -K tun0 gso off`0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0 bind(3, ..., ...) = 0
+0 listen(3, 1) = 0+0 < S 0:0(0) win 10000 <mss 1000,nop,nop,sackOK>
+0 > S. 0:0(0) ack 1 <...>
+0.01 < . 1:1(0) ack 1 win 10000
+0 accept(3, ..., ...) = 4+0.01 < P. 1:101(100) ack 1 win 3000
+0 > . 1:1(0) ack 101+0.01 < P. 101:201(100) ack 1 win 2000
+0 > . 1:1(0) ack 201+0.01 write(4,...,4000) = 4000
#
Win 2000 限制生效,服務器端僅能發出 2 個 MSS 1000 大小的數據包。
# tcpdump -i any -nn port 8080
tcpdump: data link type LINUX_SLL2
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
23:00:54.817885 tun0 In IP 192.0.2.1.49051 > 192.168.214.190.8080: Flags [S], seq 0, win 10000, options [mss 1000,nop,nop,sackOK], length 0
23:00:54.817913 tun0 Out IP 192.168.214.190.8080 > 192.0.2.1.49051: Flags [S.], seq 2399892350, ack 1, win 64240, options [mss 1460,nop,nop,sackOK], length 0
23:00:54.827992 ? In IP 192.0.2.1.49051 > 192.168.214.190.8080: Flags [.], ack 1, win 10000, length 0
23:00:54.838065 ? In IP 192.0.2.1.49051 > 192.168.214.190.8080: Flags [P.], seq 1:101, ack 1, win 3000, length 100: HTTP
23:00:54.838084 ? Out IP 192.168.214.190.8080 > 192.0.2.1.49051: Flags [.], ack 101, win 64140, length 0
23:00:54.848151 ? In IP 192.0.2.1.49051 > 192.168.214.190.8080: Flags [P.], seq 101:201, ack 1, win 2000, length 100: HTTP
23:00:54.848170 ? Out IP 192.168.214.190.8080 > 192.0.2.1.49051: Flags [.], ack 201, win 64040, length 0
23:00:54.858248 ? Out IP 192.168.214.190.8080 > 192.0.2.1.49051: Flags [.], seq 1:1001, ack 201, win 64040, length 1000: HTTP
23:00:54.858252 ? Out IP 192.168.214.190.8080 > 192.0.2.1.49051: Flags [P.], seq 1001:2001, ack 201, win 64040, length 1000: HTTP
#
- (ack_seq == tp->snd_wl1 && nwin > tp->snd_wnd)
修改腳本,使得 nwin 3000 大于 snd_wnd 2000。
# cat tcp_troubleshooting_1_005.pkt
`ethtool -K tun0 tso offethtool -K tun0 gso off`0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0 bind(3, ..., ...) = 0
+0 listen(3, 1) = 0+0 < S 0:0(0) win 10000 <mss 1000,nop,nop,sackOK>
+0 > S. 0:0(0) ack 1 <...>
+0.01 < . 1:1(0) ack 1 win 2000
+0 accept(3, ..., ...) = 4+0.01 < P. 1:101(100) ack 1 win 3000
+0 > . 1:1(0) ack 101+0.01 write(4,...,4000) = 4000
#
Win 3000 限制生效,服務器端能發出 3 個 MSS 1000 大小的數據包。
# tcpdump -i any -nn port 8080
tcpdump: data link type LINUX_SLL2
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
23:03:00.017903 ? In IP 192.0.2.1.34617 > 192.168.35.60.8080: Flags [S], seq 0, win 10000, options [mss 1000,nop,nop,sackOK], length 0
23:03:00.017940 ? Out IP 192.168.35.60.8080 > 192.0.2.1.34617: Flags [S.], seq 3161188315, ack 1, win 64240, options [mss 1460,nop,nop,sackOK], length 0
23:03:00.028034 ? In IP 192.0.2.1.34617 > 192.168.35.60.8080: Flags [.], ack 1, win 2000, length 0
23:03:00.038103 ? In IP 192.0.2.1.34617 > 192.168.35.60.8080: Flags [P.], seq 1:101, ack 1, win 3000, length 100: HTTP
23:03:00.038126 ? Out IP 192.168.35.60.8080 > 192.0.2.1.34617: Flags [.], ack 101, win 64140, length 0
23:03:00.048222 ? Out IP 192.168.35.60.8080 > 192.0.2.1.34617: Flags [.], seq 1:1001, ack 101, win 64140, length 1000: HTTP
23:03:00.048232 ? Out IP 192.168.35.60.8080 > 192.0.2.1.34617: Flags [P.], seq 1001:2001, ack 101, win 64140, length 1000: HTTP
23:03:00.048235 ? Out IP 192.168.35.60.8080 > 192.0.2.1.34617: Flags [.], seq 2001:3001, ack 101, win 64140, length 1000: HTTP
#
問題總結
實際中的場景,你們是否有見過,感覺奇奇怪怪的知識又增加了。