問題背景
lettuce 連接Redis的主從實例,當主節的主機異常下電重啟后,由于沒有發送RST 包,導致 lettuce 一直在復用之前的TCP鏈接,然后會出現連接超時的情況。一直出現io.lettuce.core.RedisCommandTimeoutException: Command timed out after
的錯誤,直到約15分鐘后,TCP不再重傳,斷開連接才恢復
詳細原因參考:
https://juejin.cn/post/7494573414150438927#heading-5
https://www.modb.pro/db/1711557186789908480
https://github.com/lettuce-io/lettuce-core/issues/2082
https://support.huaweicloud.com/usermanual-dcs/dcs-ug-211105002.html
解決方案
最簡單的是修改客戶端操作系統的TCP重傳次數,例如改為 8次。就可以不再等待15分鐘
sysctl -w net.ipv4.tcp_retries2=8
但是由于影響較大,也會影響主機上其他的TCP連接,此方案被排除。
于是按照官方提供的方案(參考:https://github.com/lettuce-io/lettuce-core/issues/2082
和 https://support.huaweicloud.com/usermanual-dcs/dcs-ug-211105002.html
),升級 lettuce 的依賴到 6.3.0,同時設置tcpUserTimeout
參數
升級版本并設置tcpUserTimeout
,應用日志顯示參數不生效,同時驗證后確實無效,問題依然存在
采用如下依賴:
<dependency><groupId>io.lettuce</groupId><artifactId>lettuce-core</artifactId><version>6.3.0.RELEASE</version>
</dependency><dependency><groupId>io.netty</groupId><artifactId>netty-transport-native-epoll</artifactId><version>4.1.100.Final</version><classifier>linux-x86_64</classifier>
</dependency>
日志顯示:
2025-05-06 13:40:05.024 WARN 1939829 --- [nio-8888-exec-1] io.lettuce.core.ConnectionBuilder : Cannot apply TCP User Timeout options to channel type io.netty.channel.socket.nio.NioSocketChannel
2025-05-06 13:40:05.390 WARN 1939829 --- [ioEventLoop-6-1] io.lettuce.core.ConnectionBuilder : Cannot apply TCP User Timeout options to channel type io.netty.channel.socket.nio.NioSocketChannel
2025-05-06 13:40:05.394 WARN 1939829 --- [ioEventLoop-6-1] io.lettuce.core.ConnectionBuilder : Cannot apply TCP User Timeout options to channel type io.netty.channel.socket.nio.NioSocketChannel
2025-05-06 13:40:05.396 WARN 1939829 --- [ioEventLoop-6-1] io.lettuce.core.ConnectionBuilder : Cannot apply TCP User Timeout options to channel type io.netty.channel.socket.nio.NioSocketChannel
分析依賴判斷是netty的沖突問題于是修改依賴的配置如下:
<dependency><groupId>io.lettuce</groupId><artifactId>lettuce-core</artifactId><version>6.3.0.RELEASE</version></dependency><dependency><groupId>io.netty</groupId><artifactId>netty-all</artifactId><version>4.1.100.Final</version></dependency>
應用啟動不再提示,并且在故障后30s出現重連的日志,說明tcpUserTimeout
參數生效了。
2025-05-07 09:07:59.411 INFO 2112812 --- [xecutorLoop-1-6] i.l.core.protocol.ConnectionWatchdog : Reconnecting, last destination was 10.50.190.43:6379
2025-05-07 09:07:59.419 INFO 2112812 --- [llEventLoop-6-3] i.l.core.protocol.ReconnectionHandler : Reconnected to 10.50.190.43:6379
Redis配置類如下:
@Configuration
public class RedisConfig {@Value("${spring.redis.host}")private String redisHost;@Value("${spring.redis.port:6379}")private Integer redisPort = 6379;@Value("${spring.redis.database:0}")private Integer redisDatabase = 0;@Value("${spring.redis.password:}")private String redisPassword;@Value("${spring.redis.connect.timeout:2000}")private Integer redisConnectTimeout = 2000;@Value("${spring.redis.read.timeout:2000}")private Integer redisReadTimeout = 2000;/*** TCP_KEEPALIVE 配置參數:* 兩次 keepalive 間的時間間隔 = TCP_KEEPALIVE_TIME = 30* 連接空閑多久開始 keepalive = TCP_KEEPALIVE_TIME/3 = 10* keepalive 幾次之后斷開連接 = TCP_KEEPALIVE_COUNT = 3*/private static final int TCP_KEEPALIVE_TIME = 30;/*** TCP_USER_TIMEOUT 連接空閑限制時間,解決Lettuce長時間超時問題。* refer: https://github.com/lettuce-io/lettuce-core/issues/2082*/private static final int TCP_USER_TIMEOUT = 30;@Beanpublic LettuceConnectionFactory redisConnectionFactory(LettuceClientConfiguration clientConfiguration) {RedisStandaloneConfiguration standaloneConfiguration = new RedisStandaloneConfiguration();standaloneConfiguration.setHostName(redisHost);standaloneConfiguration.setPort(redisPort);standaloneConfiguration.setDatabase(redisDatabase);standaloneConfiguration.setPassword(redisPassword);LettuceConnectionFactory connectionFactory = new LettuceConnectionFactory(standaloneConfiguration, clientConfiguration);connectionFactory.setDatabase(redisDatabase);return connectionFactory;}@Beanpublic LettuceClientConfiguration clientConfiguration() {SocketOptions socketOptions = SocketOptions.builder().keepAlive(SocketOptions.KeepAliveOptions.builder()// 兩次 keepalive 間的時間間隔.idle(Duration.ofSeconds(TCP_KEEPALIVE_TIME))// 連接空閑多久開始 keepalive.interval(Duration.ofSeconds(TCP_KEEPALIVE_TIME / 3))// keepalive 幾次之后斷開連接.count(3)// 是否開啟保活連接.enable().build()).tcpUserTimeout(SocketOptions.TcpUserTimeoutOptions.builder()// 解決服務端rst導致的長時間超時問題.tcpUserTimeout(Duration.ofSeconds(TCP_USER_TIMEOUT)).enable().build())// tcp 連接超時設置.connectTimeout(Duration.ofMillis(redisConnectTimeout)).build();ClientOptions clientOptions = ClientOptions.builder().autoReconnect(true).pingBeforeActivateConnection(true).cancelCommandsOnReconnectFailure(false).disconnectedBehavior(ClientOptions.DisconnectedBehavior.ACCEPT_COMMANDS).socketOptions(socketOptions).build();LettuceClientConfiguration clientConfiguration = LettuceClientConfiguration.builder().commandTimeout(Duration.ofMillis(redisReadTimeout)).readFrom(ReadFrom.MASTER).clientOptions(clientOptions).build();return clientConfiguration;}@BeanRedisTemplate<String, Object> redisTemplate(LettuceConnectionFactory redisConnectionFactory) {RedisTemplate<String, Object> template = new RedisTemplate<>();template.setConnectionFactory(redisConnectionFactory);System.out.println("SocketOptions: " + redisConnectionFactory.getClientConfiguration().getClientOptions().get().getSocketOptions().getTcpUserTimeout().getTcpUserTimeout().toString());return template;}}