Spring Boot分布式項目異常處理實戰：從崩潰邊緣到優雅恢復

當單體應用拆分成分布式系統，異常就像被打開的潘多拉魔盒：RPC調用超時、分布式事務雪崩、第三方接口突然罷工…在最近的電商大促中，我們的系統就經歷了這樣的至暗時刻。本文將用真實代碼示例，展示如何構建分布式異常處理體系。

一、全局異常攔截：最后的防線

@RestControllerAdvice
public class GlobalExceptionHandler {// 處理業務異常@ExceptionHandler(BizException.class)public Result handleBizException(BizException e) {log.error("業務異常: {}", e.getErrorMsg());return Result.fail(e.getErrorCode(), e.getErrorMsg());}// 處理Feign調用異常@ExceptionHandler(FeignException.class)public Result handleFeignException(FeignException e) {log.error("服務調用異常: {}", e.contentUTF8());return Result.fail(ErrorCode.SERVICE_UNAVAILABLE);}// 兜底異常處理@ExceptionHandler(Exception.class)public Result handleException(Exception e) {log.error("系統異常: {}", e.getMessage());return Result.fail(ErrorCode.SYSTEM_ERROR);}
}

關鍵點：通過@ControllerAdvice實現三層防護，特別注意對Feign異常的單獨處理，保留原始錯誤信息

二、服務間調用異常處理

Feign+Sentinel雙保險配置：

feign:client:config:default:connectTimeout: 3000readTimeout: 5000sentinel:scg:fallback:mode: responseresponse-body: '{"code":503,"msg":"服務降級"}'

自定義FallbackFactory：

@Component
public class OrderServiceFallbackFactory implements FallbackFactory<OrderServiceClient> {@Overridepublic OrderServiceClient create(Throwable cause) {return new OrderServiceClient() {@Overridepublic Result<OrderDTO> getOrder(String orderId) {if(cause instanceof BizException){return Result.fail(((BizException) cause).getErrorCode(), "訂單服務異常");}return Result.fail(ErrorCode.SERVICE_DEGRADE);}};}
}

實戰經驗：在雙十一大促中，這種組合策略幫助我們攔截了70%以上的級聯故障

三、分布式事務異常處理

使用Seata的TCC模式示例：

@LocalTCC
public interface OrderTccAction {@TwoPhaseBusinessAction(name = "prepareCreateOrder", commitMethod = "commit", rollbackMethod = "rollback")boolean prepare(BusinessActionContext actionContext,@BusinessActionContextParameter(paramName = "order") Order order);boolean commit(BusinessActionContext actionContext);boolean rollback(BusinessActionContext actionContext);
}

補償策略：

自動重試：對網絡抖動等臨時性錯誤
人工干預：對數據不一致等嚴重問題
事務日志：記錄關鍵操作節點

四、流量洪峰應對策略

Resilience4j熔斷配置：

CircuitBreakerConfig config = CircuitBreakerConfig.custom().failureRateThreshold(50).waitDurationInOpenState(Duration.ofMillis(1000)).ringBufferSizeInHalfOpenState(2).ringBufferSizeInClosedState(2).build();

自適應限流算法：

@Slf4j
public class AdaptiveLimiter {private AtomicInteger currentLimit = new AtomicInteger(100);public boolean tryAcquire() {int current = currentLimit.get();if(current <= 0) return false;// 根據RT和成功率動態調整double successRate = getRecentSuccessRate();long avgRT = getAvgResponseTime();if(successRate < 90% || avgRT > 500ms) {currentLimit.updateAndGet(x -> Math.max(x/2, 10));} else if(successRate > 99% && avgRT < 100ms) {currentLimit.updateAndGet(x -> Math.min(x*2, 1000));}return true;}
}

五、異常追蹤三板斧

全鏈路追蹤：

@Slf4j
public class TraceInterceptor extends HandlerInterceptorAdapter {@Overridepublic boolean preHandle(HttpServletRequest request, HttpServletResponse response, Object handler) {MDC.put("traceId", UUID.randomUUID().toString());}
}

異常畫像系統：

@Aspect
@Component
public class ExceptionMonitor {@AfterThrowing(pointcut = "execution(* com..*.*(..))", throwing = "ex")public void monitorException(Exception ex) {ExceptionMetric metric = new ExceptionMetric(ex.getClass().getSimpleName(),Thread.currentThread().getName(),System.currentTimeMillis());KafkaTemplate.send("exception_metrics", metric);}
}

智能告警：基于ELK的異常模式識別

總結

在分布式系統中，異常處理不是簡單的try-catch，而是需要建立完整的防御體系：

全局異常攔截：統一異常出口
服務治理：熔斷/限流/降級三板斧
事務補償：最終一致性保障
智能監控：快速定位問題根源

當系統吞吐量從100TPS提升到5000TPS時，我們的異常處理體系經受住了真實流量的考驗。記住：好的異常處理方案，不是消滅異常，而是讓系統優雅地與之共處。

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/bicheng/74714.shtml
繁體地址，請注明出處：http://hk.pswp.cn/bicheng/74714.shtml
英文地址，請注明出處：http://en.pswp.cn/bicheng/74714.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！