Gin 社區超時中間件的坑:導致線上 Pod 異常重啟
在最近的項目中,我們遇到了因為 Gin 超時中間件(timeout
) 引發的生產事故:Pod 異常退出并重啟。
問題現場
pod無故重啟,抓取標準輸出日志,問題指向超時中間件
堆棧報錯信息如下
為什么會并發寫入呢? 報錯指向Go社區的超時中間件,社區搜索相關issue, 果然有相關問題 https://github.com/gin-contrib/timeout/pull/55
我們的代碼封裝
func timeoutMiddleWare(timeoutInt int) gin.HandlerFunc {return timeout.New(timeout.WithTimeout(time.Duration(timeoutInt)*time.Second),timeout.WithResponse(func(c *gin.Context) {c.JSON(http.StatusGatewayTimeout, response.Failed(http.StatusGatewayTimeout, nil))}),)
}
問題復現與成因
先說原因:
超時中間件額外開了一個協程去執行業務邏輯,超時中間件的邏輯在另外的的協程中,當請求超時發生時會出現了兩個 goroutine 同時對響應進行寫操作,而gin的源碼響應中有寫入map的操作,這會導致 重復寫入,并觸發 map 并發寫 錯誤(Go 的 map 在并發寫時會直接 panic), 從而導致Pod 異常退出,K8s 會立刻重啟容器。
源碼分析:
// github.com/gin-contrib/timeout v1.0.1
var bufPool *BufferPool
const (defaultTimeout = 5 * time.Second
)
// New wraps a handler and aborts the process of the handler if the timeout is reached
func New(opts ...Option) gin.HandlerFunc {t := &Timeout{timeout: defaultTimeout,handler: nil,response: defaultResponse,}// Loop through each optionfor _, opt := range opts {if opt == nil {panic("timeout Option not be nil")}// Call the option giving the instantiatedopt(t)}if t.timeout <= 0 {return t.handler}bufPool = &BufferPool{}return func(c *gin.Context) {finish := make(chan struct{}, 1)panicChan := make(chan interface{}, 1)w := c.Writerbuffer := bufPool.Get()tw := NewWriter(w, buffer)c.Writer = twbuffer.Reset()// 這里開了一個協程去執行業務邏輯go func() {defer func() {if p := recover(); p != nil {panicChan <- p}}()t.handler(c)finish <- struct{}{}}()select {case p := <-panicChan:tw.FreeBuffer()c.Writer = wpanic(p)case <-finish:c.Next()tw.mu.Lock()defer tw.mu.Unlock()dst := tw.ResponseWriter.Header()for k, vv := range tw.Header() {dst[k] = vv}tw.ResponseWriter.WriteHeader(tw.code)if _, err := tw.ResponseWriter.Write(buffer.Bytes()); err != nil {panic(err)}tw.FreeBuffer()bufPool.Put(buffer)case <-time.After(t.timeout):c.Abort()tw.mu.Lock()defer tw.mu.Unlock()tw.timeout = truetw.FreeBuffer()bufPool.Put(buffer)// v1.0.1 報錯的代碼c.Writer = wt.response(c)c.Writer = tw// v1.1.0 修復后的PR代碼cc := c.Copy() // 重新拷貝了一份gin.Context進行響應cc.Writer = wt.response(cc)}}// t.response 實際是調用gin.Context.String()
func defaultResponse(c *gin.Context) {c.String(http.StatusRequestTimeout, http.StatusText(http.StatusRequestTimeout))
}// gin源碼 v1.10.0:
func (c *Context) String(code int, format string, values ...interface{}) {c.Render(code, render.String{Format: format, Data: values})
}// Render writes the response headers and calls render.Render to render data.
func (c *Context) Render(code int, r render.Render) {c.Status(code)if !bodyAllowedForStatus(code) {// 關鍵在這里 r.WriteContentType(c.Writer)c.Writer.WriteHeaderNow()return}if err := r.Render(c.Writer); err != nil {panic(err)}
}// WriteContentType (JSON) writes JSON ContentType.
func (r JSON) WriteContentType(w http.ResponseWriter) {writeContentType(w, jsonContentType)
}// !!!WriteContentType 最終會往header(map)中寫入值,引發并發問題 !!!func writeContentType(w http.ResponseWriter, value []string) {header := w.Header()if val := header["Content-Type"]; len(val) == 0 {header["Content-Type"] = value}
}
社區修復
修復詳情相關 PR:fix(response): conflict when handler completed and concurrent map writes by demouth · Pull Request #
解決辦法
所有使用 go get github.com/gin-contrib/timeout
的項目需要升級:
- Go 版本 ≥ 1.23.0
- 拉取最新包:
go get github.com/gin-contrib/timeout@v1.1.0