【GaussDB】使用gdb定位GaussDB編譯package報錯
背景
在某次遷移Oracle到GaussDB時,應用開發人員將改好的package在GaussDB里進行創建,沒有ERROR也沒有WARNING,但是編譯無效對象的時候報錯了。雖然已經找到了是哪個包編譯報錯,但是這個包有上萬行,而且里面也有好幾十個procedure,而報錯信息僅僅只有 ERROR: Failed to query the 323 type in the cache.
,沒有上下文信息,連行號都沒有,根本無從判斷是哪里出了問題。
基本排查
- 嘗試drop這個package,然后重建,再編譯,現象一樣
- 嘗試重啟數據庫以清空全局PLSQL緩存,再編譯,現象一樣
這意味著這個問題與緩存無關,大概率也與其他依賴對象無關,所以暫時針對這個package進行排查。
對于不會gdb調試的人來說,要排查這個問題只能對著這個package,使用二分法來刪除里面的procedure,直到刪到某個procedure前后報錯發生變化,但要注意里面的procedure的依賴。當時就這么一路刪,最后的確發現了原因,但是耗費的時間非常久,解決依賴關系時還要手動改代碼。
那么有沒有一種方式能迅速定位是哪個procedure的問題么?
當然有,那就是使用gdb直接進行內核級別調試,因為Gauss系數據庫編譯package時,都是會逐個對里面的每一個procedure和function進行編譯。
gdb調試前置準備
由于已經找到了觸發這個報錯的package特征,因此下面就用最小化模擬用例來進行演示:
測試用例
create package pkg_test_4 is
procedure p1(i1 in varchar2,i2 out varchar2,i3 out varchar2);
end;
/
create package body pkg_test_4 is
procedure p1(i1 in varchar2,i2 in varchar2,i3 out varchar2) isbeginnull;end;
end;
/
alter package pkg_test_4 compile;
執行效果
gaussdb=# alter package pkg_test_4 compile;
gaussdb=# create or replace package pkg_test_4 is
gaussdb$# procedure p1(i1 in varchar2,
gaussdb$# i2 out varchar2,
gaussdb$# i3 out varchar2);
gaussdb$# end;
gaussdb$# /
CREATE PACKAGE
gaussdb=# create or replace package body pkg_test_4 is
gaussdb$# procedure p1(i1 in varchar2,
gaussdb$# i2 in varchar2,
gaussdb$# i3 out varchar2) is
gaussdb$# begin
gaussdb$# null;
gaussdb$# end;
gaussdb$# end;
gaussdb$# /
CREATE PACKAGE BODY
gaussdb=# alter package pkg_test_4 compile;
ERROR: Failed to query the 323 type in the cache.
gaussdb=#
使用gdb調試找問題有個關鍵,就是這個問題最好是能穩定復現的,否則gdb抓不到報錯現場也很難分析問題。
另外,開始gdb調試前,一定要先把對應版本的符號表準備好,比較簡單的方式就是直接把符號表里的bin和lib解壓到GaussDB的bin和lib目錄。
在之前分析MogDB的問題時,我們內核研發有教過我可以使用 b errstart if elevel>19
設置斷點來斷住所有 ERROR
以上級別的報錯,但是這招在GaussDB似乎不行了
(gdb) b errstart if elevel>19
No symbol "elevel" in current context.
(gdb)
如果直接 b errstart
,是可以斷,但是會老是斷,根本沒法跑起來,因為這里就算沒報錯也會調用進來,幾乎所有線程都在頻繁走到這里,參考openGauss源碼中的錯誤級別代碼,里面連INFO/NOTICE都有
/* Error level codes */
#define DEBUG5 \10 /* Debugging messages, in categories of \* decreasing detail. */
#define DEBUG4 11
#define DEBUG3 12
#define DEBUG2 13
#define DEBUG1 14 /* used by GUC debug_* variables */
#define LOG \15 /* Server operational messages; sent only to \* server log by default. */
#define COMMERROR \16 /* Client communication problems; same as LOG \* for server reporting, but never sent to \* client. */
#define INFO \17 /* Messages specifically requested by user (eg \* VACUUM VERBOSE output); always sent to \* client regardless of client_min_messages, \* but by default not sent to server log. */
#define NOTICE \18 /* Helpful messages to users about query \* operation; sent to client and server log by \* default. */
#define WARNING \19 /* Warnings. NOTICE is for expected messages \* like implicit sequence creation by SERIAL. \* WARNING is for unexpected messages. */
#define ERROR \20 /* user error - abort transaction; return to \* known state */
#define VERBOSEMESSAGE \9 /* indicates to show verbose info for CN and DNs; \* for DNs means to send info back to CN */
/* Save ERROR value in PGERROR so it can be restored when Win32 includes* modify it. We have to use a constant rather than ERROR because macros* are expanded only when referenced outside macros.*/
#ifdef WIN32
#define PGERROR 20
#endif
#define FATAL 21 /* fatal error - abort process */
#define PANIC 22 /* take down the other backends with me *//* MAKE_SQLSTATE('P', '1', '0' , '0', '0')=96 */
#define CUSTOM_ERRCODE_P1 96
看一下 b errstart
會斷到哪里
(gdb) b errstart
Breakpoint 1 at 0x564904871b20: errstart. (3 locations)
(gdb) info b
Num Type Disp Enb Address What
1 breakpoint keep y <MULTIPLE>
1.1 y 0x0000564904871b20 in errstart(int, char const*, int, char const*, char const*)at /usr1/GaussDBKernel/server/opengauss/src/auxiliary/share/error/elog.cpp:4108
1.2 y 0x00007f6b9cd5e5b0 <errstart(int, char const*, int, char const*, char const*)@plt>
1.3 y 0x00007f6b9d03eb30 <errstart(int, char const*, int, char const*, char const*)@plt>
(gdb)
從這個斷點信息里來看,errstart
在 elog.cpp
的4108行,這很是可疑,因為無論是openGauss還是MogDB,這個errstart函數應該在更前面的位置,大概是第兩百多行的地方。
由于沒有源碼,就只能反匯編看下有沒有能參考的信息了
(gdb) disassemble /m errstart
Dump of assembler code for function errstart(int, char const*, int, char const*, char const*):
238 /usr1/GaussDBKernel/server/opengauss/src/auxiliary/share/error/elog.cpp: No such file or directory.0x0000564904871b25 <+5>: push %rbp0x0000564904871b26 <+6>: mov %rsp,%rbp0x0000564904871b29 <+9>: push %r150x0000564904871b2b <+11>: push %r140x0000564904871b2d <+13>: push %r130x0000564904871b2f <+15>: push %r120x0000564904871b31 <+17>: mov %rsi,%r140x0000564904871b34 <+20>: push %rbx0x0000564904871b35 <+21>: mov %edi,%ebx0x0000564904871b37 <+23>: sub $0x58,%rsp0x0000564904871b3e <+30>: mov %edx,-0x64(%rbp)0x0000564904871b41 <+33>: mov %rcx,-0x70(%rbp)0x0000564904871b45 <+37>: mov %r8,-0x78(%rbp)239 in /usr1/GaussDBKernel/server/opengauss/src/auxiliary/share/error/elog.cpp
240 in /usr1/GaussDBKernel/server/opengauss/src/auxiliary/share/error/elog.cpp
可以看到這個函數第一次出現其實是在第238行,從這幾個寄存器的操作來看,應該就是函數的入口,也就是說,實際上這個函數的定義應該在第238行,而不是前面的4108行。我觀察了下,4108行的elevel一直是被優化掉的,看不到里面的值,只有238行的elevel能觀測到值。
所以在GaussDB里要斷ERROR及以上級別的錯誤,斷點應該設置為
b elog.cpp:238 if (elevel > 19)
可以提前先準備好下面的命令,gdb進去后直接復制粘貼,減少進程中斷時間
b elog.cpp:238 if (elevel > 19)
handle SIGUSR1 nostop noprint
handle SIGUSR2 nostop noprint
handle SIGPIPE nostop
set pagi off
set print elements 300
continue
正式開始gdb調試
先用 ps -ef |grep gaussdb
找到進程號
然后gdb -p 進程號
[gaussdb506@ky10-sp3 ~]$ ps -ef |grep gaussdb
root 426694 426551 0 09:19 pts/0 00:00:00 su - gaussdb506
gaussdb+ 426699 426694 0 09:19 pts/0 00:00:00 -bash
gaussdb+ 427027 426699 0 09:19 pts/0 00:00:00 ps -ef
gaussdb+ 427028 426699 0 09:19 pts/0 00:00:00 grep gaussdb
og_last+ 3231792 1 1 Jul22 ? 05:58:52 /opt/og_lastest/openGauss-server/dest/bin/gaussdb
og700rc1 3508544 1 1 Aug04 ? 00:33:58 /opt/og700rc1/app/bin/gaussdb -D /opt/og700rc1/data -M primary
gaussdb+ 3864702 1 29 Aug05 ? 06:33:50 /data/gaussdb506/app/bin/gaussdb
[gaussdb506@ky10-sp3 ~]$ gdb -p 3864702
GNU gdb (GDB) KylinOS 9.2-3.p01.ky10
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-kylin-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:<http://www.gnu.org/software/gdb/documentation/>.For help, type "help".
Type "apropos word" to search for commands related to "word".
Attaching to process 3864702
[New LWP 3864703]
[New LWP 3864749]
...#省略
[New LWP 4081944]warning: File "/usr/lib64/libthread_db-1.0.so" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load".
--Type <RET> for more, q to quit, c to continue without paging--
To enable execution of this file addadd-auto-load-safe-path /usr/lib64/libthread_db-1.0.so
line to your configuration file "/home/gaussdb506/.gdbinit".
To completely disable this security protection addset auto-load safe-path /
line to your configuration file "/home/gaussdb506/.gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual. E.g., run from the shell:info "(gdb)Auto-loading safe path"warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.warning: File "/usr/lib64/libthread_db-1.0.so" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load".warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
0x00007f6bbc53c849 in poll () from /usr/lib64/libc.so.6
(gdb) b elog.cpp:238 if (elevel > 19)
Breakpoint 1 at 0x564904871b25: file /usr1/GaussDBKernel/server/opengauss/src/auxiliary/share/error/elog.cpp, line 238.
(gdb) handle SIGUSR1 nostop noprint
Signal Stop Print Pass to program Description
SIGUSR1 No No Yes User defined signal 1
(gdb) handle SIGUSR2 nostop noprint
Signal Stop Print Pass to program Description
SIGUSR2 No No Yes User defined signal 2
(gdb) handle SIGPIPE nostop
Signal Stop Print Pass to program Description
SIGPIPE No Yes Yes Broken pipe
(gdb) set pagi off
(gdb) set print elements 300
(gdb) continue
Continuing.
[New LWP 427599]
[New LWP 427600]
[LWP 427599 exited]
[New LWP 427601]
[LWP 427601 exited]
[New LWP 427602]
[LWP 427602 exited]
[LWP 427600 exited]
當后面不斷有輸出 [New LWP xxxxxx]
時,gaussdb就是正常運行中了。
接下來可以開一個客戶端連接,執行上面用于模擬測試的sql,會卡在 alter package pkg_test_4 compile;
這個語句上,同時gdb的窗口不再連續打印 [New LWP xxxxxx]
,而是命中了斷點
[Switching to LWP 3864766]Thread 16 "TPLworker" hit Breakpoint 1, errstart (elevel=20, filename=0x564908abfcc8 "format_type.cpp", lineno=216, funcname=0x564908abfdd0 <format_type_internal(unsigned int, int, bool, bool, bool)::__func__> "format_type_internal", domain=0x5649087e1004 "plpgsql-9.2") at /usr1/GaussDBKernel/server/opengauss/src/auxiliary/share/error/elog.cpp:238
238 in /usr1/GaussDBKernel/server/opengauss/src/auxiliary/share/error/elog.cpp
(gdb)
接著輸入bt查看堆棧
(gdb) bt
#0 errstart (elevel=20, filename=0x564908abfcc8 "format_type.cpp", lineno=216, funcname=0x564908abfdd0 <format_type_internal(unsigned int, int, bool, bool, bool)::__func__> "format_type_internal", domain=0x5649087e1004 "plpgsql-9.2") at /usr1/GaussDBKernel/server/opengauss/src/auxiliary/share/error/elog.cpp:238
#1 0x00005649043cc8ae in format_type_internal (type_oid=323, typemod=-1, typemod_given=<optimized out>, allow_invalid=<optimized out>, include_nspname=<optimized out>) at /usr1/GaussDBKernel/server/opengauss/src/auxiliary/share/adt/format_type.cpp:212
#2 0x00005649045b82e8 in format_procedure (procedure_oid=<optimized out>) at /usr1/GaussDBKernel/server/opengauss/src/auxiliary/share/adt/regproc.cpp:473
#3 0x0000564904a2d53d in do_compile (fcinfo=0x7f6a520460c0, proc_tup=0x7f69b7c375a0, func=0x7f6a39c64050, compile_func_head_info=0x7f6a52046740, for_validator=true, hashkey=0x7f6a52045d50) at /usr1/GaussDBKernel/server/opengauss/src/gausskernel/pl/plsql/pl_comp/pl_comp_func_main.cpp:921
#4 0x0000564904a35f3b in gsplsql_compile (fcinfo=0x7f6a520460c0, compile_func_head_info=0x7f6a52046740, for_validator=true, isRecompile=false, func_runtime_state=0x0) at /usr1/GaussDBKernel/server/opengauss/src/gausskernel/pl/plsql/pl_comp/pl_comp_func_main.cpp:3106
#5 0x0000564906c64eeb in plpgsql_validator (fcinfo=<optimized out>) at /usr1/GaussDBKernel/server/opengauss/src/compatibility/sql_adaptor/pl/plpgsql/src/pl_handler.cpp:1481
#6 0x00005649048a56cb in OidFunctionCall4Coll (function_id=10790, collation=0, arg1=97664, arg2=0, arg3=0, arg4=140094619281216, is_null=0x0) at /usr1/GaussDBKernel/server/opengauss/src/auxiliary/share/fmgr/fmgr.cpp:2512
#7 0x0000564904a55363 in gsplsql_func_in_pkg_compile (pkg=0x7f6a455ec050) at /usr1/GaussDBKernel/server/opengauss/src/gausskernel/pl/plsql/pl_comp/pl_comp_pkg_main.cpp:1210
#8 0x0000564904a571fc in gsplsql_pkg_init (pkg=0x7f6a455ec050, isCreate=false, isSpec=false, ret_pkg_runtime=0x7f6a52046a18, is_need_compile_func=true, pkg_debug_query_string=<optimized out>, old_pkg_runtime=0x7f6a39a48050) at /usr1/GaussDBKernel/server/opengauss/src/gausskernel/pl/plsql/pl_comp/pl_comp_pkg_main.cpp:1545
#9 0x0000564904a584a3 in gsplsql_pkg_compile (pkg_oid=97663, for_validator=true, is_spec=false, is_create=false, is_recompile=true, pkg_runtime_state=0x0) at /usr1/GaussDBKernel/server/opengauss/src/gausskernel/pl/plsql/pl_comp/pl_comp_pkg_main.cpp:956
#10 0x0000564905100dc6 in recompile_single_package (package_oid=97663, is_spec=false) at /usr1/GaussDBKernel/server/opengauss/src/compatibility/sql_adaptor/commands/packagecmds.cpp:329
#11 0x0000564905101212 in recompile_package_by_oid (pkg_oid=97663, recompile_invalid_pkg=false) at /usr1/GaussDBKernel/server/opengauss/src/compatibility/sql_adaptor/commands/packagecmds.cpp:416
#12 0x0000564905101262 in recompile_package (stmt=0x7f6a3f4954c0) at /usr1/GaussDBKernel/server/opengauss/src/compatibility/sql_adaptor/commands/packagecmds.cpp:437
#13 0x0000564905431abe in sqlcmd_standard_process_utility (parse_tree=0x7f6a3f4954c0, query_string=0x7f6a3f496050 "alter package pkg_test_4 compile", params=0x0, is_top_level=<optimized out>, dest=0x56490a6e0720 <donothingDR>, sent_to_remote=<optimized out>, completion_tag=0x7f6a5204a430 "", isCTAS=false) at /usr1/GaussDBKernel/server/opengauss/src/auxiliary/proc/tcop/utility.cpp:6813
#14 0x00007f6b9cd9b759 in gsaudit_ProcessUtility_hook (parsetree=0x7f6a3f4954c0, queryString=0x7f6a3f496050 "alter package pkg_test_4 compile", params=0x0, isTopLevel=<optimized out>, dest=0x56490a6e0720 <donothingDR>, sentToRemote=<optimized out>, completionTag=0x7f6a5204a430 "", isCTAS=false) at /usr1/GaussDBKernel/server/opengauss/src/gausskernel/security/security_plugin/security_policy_plugin.cpp:856
#15 0x00005649059a0f52 in audit_process_utility (parsetree=0x7f6a3f4954c0, query_string=0x7f6a3f496050 "alter package pkg_test_4 compile", params=<optimized out>, is_top_level=<optimized out>, dest=<optimized out>, sent_to_remote=<optimized out>, completion_tag=0x7f6a5204a430 "", is_ctas=false) at /usr1/GaussDBKernel/server/opengauss/src/gausskernel/security/audit/security_auditfuncs.cpp:1512
#16 0x000056490543c71d in sqlcmd_process_utility (parse_tree=0x7f6a3f4954c0, query_string=0x7f6a3f496050 "alter package pkg_test_4 compile", params=0x0, is_top_level=<optimized out>, dest=<optimized out>, sent_to_remote=<optimized out>, completion_tag=0x7f6a5204a430 "", isCTAS=false) at /usr1/GaussDBKernel/server/opengauss/src/auxiliary/proc/tcop/utility.cpp:1974
#17 0x000056490541d83f in PortalRunUtility (portal=0x7f6a48878050, utilityStmt=0x7f6a3f4954c0, isTopLevel=true, dest=0x56490a6e0720 <donothingDR>, completionTag=0x7f6a5204a430 "") at /usr1/GaussDBKernel/server/opengauss/src/auxiliary/proc/tcop/pquery.cpp:2140
#18 0x000056490541f0be in PortalRunMulti (portal=0x7f6a48878050, isTopLevel=true, dest=0x56490a6e0720 <donothingDR>, altdest=0x56490a6e0720 <donothingDR>, completionTag=0x7f6a5204a430 "", snapshot=0x0, bii_state=0x0) at /usr1/GaussDBKernel/server/opengauss/src/auxiliary/proc/tcop/pquery.cpp:2326
#19 0x00005649054232dc in PortalRun (portal=0x7f6a48878050, count=200, isTopLevel=true, dest=0x7f6a3f4a84d0, altdest=0x7f6a3f4a84d0, completionTag=0x7f6a5204a430 "", snapshot=0x0, bii_state=0x0) at /usr1/GaussDBKernel/server/opengauss/src/auxiliary/proc/tcop/pquery.cpp:1501
#20 0x00005649054158af in exec_execute_message (max_rows=200, portal_name=0x7f6a3f4a8050 "") at /usr1/GaussDBKernel/server/opengauss/src/auxiliary/proc/tcop/postgres.cpp:7071
#21 gs_process_command (firstchar=<optimized out>, input_message=<optimized out>, send_ready_for_query=<optimized out>) at /usr1/GaussDBKernel/server/opengauss/src/auxiliary/proc/tcop/postgres.cpp:12314
#22 0x000056490541b9c0 in PostgresMain (argc=<optimized out>, argv=0x7f6a49e45b20, dbname=<optimized out>, username=<optimized out>) at /usr1/GaussDBKernel/server/opengauss/src/auxiliary/proc/tcop/postgres.cpp:11313
#23 0x000056490539f2df in backend_run (port=0x7f6a5204a890) at /usr1/GaussDBKernel/server/opengauss/src/auxiliary/proc/postmaster/postmaster.cpp:12482
#24 0x00005649053de1b0 in gauss_db_worker_thread_main<(knl_thread_role)2> (arg=<optimized out>) at /usr1/GaussDBKernel/server/opengauss/src/auxiliary/proc/postmaster/postmaster.cpp:19086
#25 0x000056490539f39a in internal_thread_func (args=<optimized out>) at /usr1/GaussDBKernel/server/opengauss/src/auxiliary/proc/postmaster/postmaster.cpp:20196
#26 0x00007f6bbc60ff1b in ?? () from /usr/lib64/libpthread.so.0
#27 0x00007f6bbc547320 in clone () from /usr/lib64/libc.so.6
可以看到在報錯的 format_type_internal
里,出現了type_oid=323,的確是在報錯中出現的數字,但是323這么小的數字明顯不可能是用戶自定義類型,因為小數字的oid都是被系統保留的。這里肯定是有bug的,但沒有源碼不方便找bug原因,本次調試的主要目的是找到出錯的procedure。
在 format_type_internal
和 format_procedure
里,有用的參數都顯示成 <optimized out>
了,這表示內核把這些變量優化掉了,不給看。于是繼續看下一行 do_compile
,打印幾個參數看看
(gdb) f 3
#3 0x0000564904a2d53d in do_compile (fcinfo=0x7f6a520460c0, proc_tup=0x7f69b7c375a0, func=0x7f6a39c64050, compile_func_head_info=0x7f6a52046740, for_validator=true, hashkey=0x7f6a52045d50) at /usr1/GaussDBKernel/server/opengauss/src/gausskernel/pl/plsql/pl_comp/pl_comp_func_main.cpp:921
921 /usr1/GaussDBKernel/server/opengauss/src/gausskernel/pl/plsql/pl_comp/pl_comp_func_main.cpp: No such file or directory.
(gdb) p *fcinfo
$1 = {flinfo = 0x7f6a52046020, context = 0x0, resultinfo = 0x0, fncollation = 0, isnull = false, nargs = 0, arg = 0x7f6a39a4ca50, argnull = 0x0, argTypes = 0x0, prealloc_arg = {0 <repeats 20 times>}, prealloc_argnull = {false <repeats 20 times>}, prealloc_argTypes = {0 <repeats 20 times>}, argVector = 0x0, refcursor_data = {argCursor = 0x0, returnCursor = 0x0, return_number = 0}, out_tmtype = 0 '\000', out_decimals = 0 '\000', udfInfo = {UDFArgsHandlerPtr = 0x0, UDFResultHandlerPtr = 0x0, udfMsgBuf = 0x0, msgReadPtr = 0x0, argBatchRows = 0, allocRows = 0, arg = 0x0, null = 0x0, result = 0x0, resultIsNull = 0x0, valid_UDFArgsHandlerPtr = false}, swinfo = {sw_econtext = 0x0, sw_exprstate = 0x0, sw_is_flt_frame = false}, out_typmode = 0x0, fn_typmode = 0, plfunc_exec_mode = 0, plfunc_exec_state = 0x0, args_done = 0x0, prealloc_args_done = {0 <repeats 20 times>}, arginfo = {{in_tmtype = 0 '\000', in_decimals = 0 '\000', argTypModes = 0, set_enum_typeoid = 0}}}
(gdb) p *proc_tup
Attempt to dereference a generic pointer.
(gdb) p *func
$2 = {type = T_PLpgSQL_FUNCTION, fn_oid = 97664, pkg_oid = 97663, namespaceOid = 2200, fn_owner = 16728, fn_input_collation = 0, fn_signature = 0x0, fn_searchpath = 0x0, namespace_searchpath = 0x0, fn_hashkey = 0x0, fn_cxt = 0x7f6a6a9599d0, fn_rettype = 0, fn_rettyplen = 0, glc_func_life = 1, fn_rettypioparam = 0, fn_retbyval = false, fn_retistuple = false, fn_retset = false, fn_readonly = false, out_param_varno = -1, found_varno = 0, fn_nallargs = 0, argmods = 0x0, argtypes = 0x0, sql_cursor_found_varno = 0, sql_notfound_varno = 0, sql_isopen_varno = 0, sql_rowcount_varno = 0, sql_bulk_exceptions_varno = 0, sqlcode_varno = 0, sqlstate_varno = 0, sqlerrm_varno = 0, new_varno = 0, old_varno = 0, tg_name_varno = 0, tg_when_varno = 0, tg_level_varno = 0, tg_op_varno = 0, tg_relid_varno = 0, tg_relname_varno = 0, tg_table_name_varno = 0, tg_table_schema_varno = 0, tg_nargs_varno = 0, tg_argv_varno = 0, retvarno = 0, guc_stat = 5, use_count = 0, resolve_option = GSPLSQL_RESOLVE_COLUMN, ndatums = 0, datums = 0x0, datum_need_free = 0x0, action = 0x0, goto_labels = 0x0, invalItems = 0x0, saved_unique_id = 4294967295, nPlaceHolders = 0, placeholders = 0x0, cur_estate = 0x0, tg_relation = 0x0, debug = 0x0, ns_top = 0x0, is_private = false, fn_is_trigger = false, pre_parse_trig = false, is_autonomous = false, is_inline_handler = false, is_valid = true, is_plpgsql_func_with_outparam = false, need_skip_process_autonm_pkg = false, remembered_by_resowner = false, typeList = 0x0, namespace_name = 0x0, expr_list = 0x0, fn_retinput = {fn_addr = 0x0, fn_oid = 0, fn_nargs = 0, fn_strict = false, fn_retset = false, fn_extra = 0x0, fn_mcxt = 0x0, fn_expr = 0x0, fn_rettype = 0, fn_rettypemod = 0, fnName = '\000' <repeats 63 times>, fnLibPath = 0x0, vec_fn_addr = 0x0, vec_fn_cache = 0x0, genericRuntime = 0x0, max_length = 0, fn_languageId = 0, fn_stats = 0 '\000', fn_fenced = false, fn_volatile = 0 '\000', decimals = 0 '\000'}, glc_status = {m_type = GLC_FUNCTION_OBJ, m_location = GLC_OBJECT_IN_SESSION_WAIT_REMOVE, m_glc_object_state = GLC_OBJECT_IS_VALID, m_refcount = 1}, expired_cell = {dle_next = 0x0, dle_prev = 0x0, dle_val = 0x7f6a39c64050, dle_list = 0x7f6a48876ea0}, compiled_dlist_elem = {dle_next = 0x0, dle_prev = 0x0, dle_val = 0x7f6a39c64050, dle_list = 0x0}, parent_pro_ndatum = 0, subparam = 0x0, fn_nargs = 0, copiable_size = 0, deep_datums = 0x0, deep_ndatums = 0, cursor_datums = 0x0, cursor_ndatums = 0, placeholder_datums = 0x0, placeholder_ndatums = 0, fn_argvarnos = 0x0, depend_info_list = 0x0, plan_total_mem_size = 0, block_level = 0x0}
(gdb)
可以在 func
里看到,fn_oid=97664
,這意味著是在編譯pg_proc里oid為97664的對象。于是我們輸入q退出gdb,然后回到客戶端查詢
gaussdb=# select proname,g.pkgname from pg_proc p,gs_package g where p.oid=97664 and g.oid=p.propackageid;proname | pkgname
---------+------------p1 | pkg_test_4
(1 row)
可以看到這個oid對應的就是pkg_test_4這個包里的p1,于是就知道了一定是編譯p1的時候出了問題。
到此,出問題的procedure就直接找出來了,肉眼一看包頭和包體的定義,發現有個參數的in/out方向沒匹配,但GaussDB在創建這個package時竟然沒有報錯…
其他Gauss系數據庫的情況
同樣的這個代碼,在openGauss 7.0.0 RC1 是不會報錯的,這個package還能正常調用,查了下數據字典,出入參方向是按包體生效的,這同樣也是個BUG,沒有做嚴格判斷。
openGauss=# create package pkg_test_4 is
openGauss$# procedure p1(i1 in varchar2,
openGauss$# i2 out varchar2,
openGauss$# i3 out varchar2);
openGauss$# end pkg_test_4;
openGauss$# /
;end;
end pkg_test_4;
/CREATE PACKAGE
openGauss=# create package body pkg_test_4 is
openGauss$# procedure p1(i1 in varchar2,
openGauss$# i2 in varchar2,
openGauss$# i3 out varchar2) is
openGauss$# begin
openGauss$# null;
openGauss$# end;
openGauss$# end pkg_test_4;
openGauss$# /
CREATE PACKAGE BODY
openGauss=# alter package pkg_test_4 compile;
ALTER PACKAGE
openGauss=# call pkg_test_4.p1(null,null,null);i3
----(1 row)openGauss=#
在MogDB 5.2.0里則是在創建package body時就報錯了,能正確檢查到包頭里的procedure在包體里沒定義
MogDB=# create package pkg_test_4 is
MogDB$# procedure p1(i1 in varchar2,
MogDB$# i2 out varchar2,
MogDB$# i3 out varchar2);
MogDB$# end pkg_test_4;
MogDB$# /
CREATE PACKAGE
MogDB=# create package body pkg_test_4 is
MogDB$# procedure p1(i1 in varchar2,
MogDB$# i2 in varchar2,
MogDB$# i3 out varchar2) is
MogDB$# begin
MogDB$# null;
MogDB$# end;
MogDB$# end pkg_test_4;
MogDB$# /
ERROR: Function definition not found: p1
MogDB=#
總結
本次觸發 ERROR: Failed to query the 323 type in the cache.
這個報錯的直接原因是創建的package和package body中,有個procedure的參數in/out方向不匹配導致。雖然客戶代碼的確有問題,但根本原因還是數據庫有BUG,未將這種異常場景檢查出來。
想要深入排查國產數據庫使用中的問題,學會使用gdb是必不可少的。我曾參與過不少國產數據庫PoC,親眼看到各個數據庫廠家的技術人員在客戶現場都曾用過gdb調試來定位問題。雖然大部分排行靠前的國產數據庫都基本已經穩定應用在各行各業了,但是仍要注意一些不起眼的小角落是否還有蟲子。
- 本文作者: DarkAthena
- 本文鏈接: https://www.darkathena.top/archives/Debugging-GaussDB-Locating-Package-Compilation-Errors-with-GDB
- 版權聲明: 本博客所有文章除特別聲明外,均采用CC BY-NC-SA 3.0 許可協議。轉載請注明出處