openresty 日志輸出的處理

最近出了個故障，有個接口的請求居然出現了長達幾十秒的處理時間，由于日志缺乏，網絡故障也解除了，就沒法再重現這個故障了。為了可以在下次出現問題的時候能追查到問題，所以需要添加一些追蹤日志。
添加這些追蹤日志，我希望能夠達到如下幾點：

1、只有請求超過一定時間才記錄，不然請求太多，系統扛不住

2、添加的代碼可以盡量的少

3、對接口的影響盡量小，比如不影響實際時延，甚至記錄日志時出現了錯誤，也不影響系統正常運行

openresty這套工具，可以在nginx處理請求的每一個階段介入，編寫代碼進行邏輯處理。其可介入的流程如下圖：
圖片描述

log Phase這個階段，就是openresty能處理的最后階段。到這個階段的時候，實際上請求的響應已經發送給客戶端了。所以使用 log_by_lua (知乎真特么蛋疼啊，左右下劃線就自動斜體，還沒提供轉義功能)

log Phase這個階段，就是openresty能處理的最后階段。到這個階段的時候，實際上請求的響應已經發送給客戶端了。另外我也測試過了，即使在這個階段發生了錯誤，如 io 錯誤，也不會影響接口的正常響應，所以使用 log_by_lua 很是符合需求。

好處不止如此， log_by_lua是一個請求的最后處理階段，那么只要請求正常進行，比如會走到這一步，因此，在這一步，我們就知道了這個請求的耗時了。另外，則是我們的代碼里有不少的 ngx.exit ，如果是在業務邏輯處理的時候就記錄日志，那么每個出現 ngx.exit 的地方，都需要插入寫日志到硬盤的操作，大大增加了代碼量。

寫日志到硬盤的這一步操作，可以在 log_by_lua 這個階段來完成，剩下的另一個問題就是每一步記錄的日志如何傳遞到 log_by_lua 這一階段來了。

我處理的方式是使用ngx.ctx, 每一個請求，都會有自己獨立的 ngx.ctx, 這個 ngx.ctx 會貫穿整個請求的始終，簡單的log函數如下：

logger.lua
--------------------------
local _M = {}function _M.log(format, ...)if ngx.ctx.log_slot == nil thenngx.ctx.log_slot = {}endarg = {...}local logstr = ""if arg == nil thenlogstr = formatelselogstr = string.format(format, unpack(arg))endlogstr = logstr .. "\t" .. ngx.now()table.insert(ngx.ctx.log_slot, logstr)
endreturn _M

到了 log_by_lua 階段要把追蹤日志寫入到硬盤里，處理代碼如下：

log_slot.lua
---------------------
local request_time = ngx.var.request_time
if request_time < 1 thenreturn  --- 小于1秒的請求不記錄
end
local slot = ngx.ctx.log_slot
if slot == nil or type(slot) ~= "table" thenreturn
end
local logs = table.concat(slot, "\n")
local f = assert(io.open("/logs/trace", "a"))
f:write(logs .. "\n")
f:close()

log_by_lua 可以用在 http 模塊，也可以用在server模塊，也能直接精確到location模塊，即只到某個請求。所以你可以在nginx.conf 里的http里添加:

http{log_by_lua_file '/code/log_slot.lua';
}

也可以在server的配置里添加：

server {log_by_lua_file '/code/log_slot.lua';
}

更能直接在某個接口里添加：

/v1/test {content_by_lua_file '/code/v1/test.lua';log_by_lua_file '/code/log_slot.lua';
}

http里添加，則對所有的server； server里添加，則只針對此server；location里添加，就只針對這個接口。

但是，比較坑爹的是，log_by_lua 不像 access log，可以多層級使用。log_by_lua 在某層使用了之后，上層的 log_by_lua 就對此一層無效了。比如 /v1/test 接口添加了 log_by_lua，那么 http 或者 server 里添加的 log_by_lua 在接受/v1/test接口的請求時都不會被用到。

正是因為這個坑，浪費了我不少的時間來解決。我們的系統里，http 模塊是配置了 log_by_lua 的，用來做接口監控，監控返回的錯誤碼，處理的時延等。如果我在 /v1/test 里添加了只針對 /v1/test 的追蹤日志，那么接口監控就無法正常運行了。

不過天無絕人之路，我想到了一個處理方法如下：

monitor_log.lua
---------------------
local _M = {}function _M.monitor_log()local f = _M.api_monitor_log_funcif f == nil thenf, err = loadfile("/code/monitor.lua")if f == nil thenngx.log(ngx.ERR, "/code/monitor.lua, ", err)--- 如果不存在接口監控，直接給一個空函數f = function() endend_M.api_monitor_log_func = fendlocal status, err = pcall(f)if not status thenngx.log(ngx.ERR, "run api monitor /code/monitor.lua failed", err)end
endreturn _M

修改log_slot.lua代碼如下：

local logger = require "code.monitor"
local request_time = ngx.var.request_time
logger.monitor_log()
if request_time < 1 thenreturn  --- 小于1秒的請求不記錄
end
local slot = ngx.ctx.log_slot
if slot == nil or type(slot) ~= "table" thenreturn
end
local logs = table.concat(slot, "\n")
local f = assert(io.open("/logs/trace", "a"))
f:write(logs .. "\n")
f:close()

如上，就可以進行其他層級的 log_by_lua 代碼運行了，皆大歡喜，問題解決了。
當系統并發請求較低的時候，worker夠用，則使用 log_by_lua 可以說是毫無壞處。當然，一旦 log_by_lua 出現了故障，如死循環，則會長時間占用worker，造成整個系統崩潰掉。

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/news/390216.shtml
繁體地址，請注明出處：http://hk.pswp.cn/news/390216.shtml
英文地址，請注明出處：http://en.pswp.cn/news/390216.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！