背景
DDL操作通常耗時較長,特別是涉及補數據流程的DDL語句。在執行過程中,用戶面臨兩個主要痛點:一是無法實時獲取DDL執行進度,難以區分長時間運行是正常現象還是由內部異常導致的停滯;二是執行效率經常低于預期,而傳統的問題診斷需要跨越多張系統表和視圖,操作流程復雜。為此,OceanBase 的新版本通過提供可查詢的系統視圖來簡化DDL性能問題排查。
使用說明
SESSION_LONGOPS視圖輸出行為變更
變更了 SESSION_LONGOPS視圖中 DDL 處于REDEFINITION階段時 message列的輸出邏輯 。其他保持和原有的一致。
DDL 處于REDEFINITION階段即通過發生innsql完成補數據流程圖。通過sql的算子實現。整個過程主要分為 scan,sort, insert三個階段。在以前版本的行為中,無論具體執行到了哪一個階段,都會返回這三個階段處理的行數。如下所示
// 不使用column group
+----------------------------------------------------------------------------------------------------------------------------+
| MESSAGE |
+----------------------------------------------------------------------------------------------------------------------------+
| TENANT_ID: %d, TASK_ID: %d, STATUS: REPLICA BUILD, PARALLELISM: %ld, ROW_SCANNED: %ld, ROW_SORTED: %ld, ROW_INSERTED: %ld |
+----------------------------------------------------------------------------------------------------------------------------+// 使用 column group
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| MESSAGE |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| TENANT_ID: %d, TASK_ID: %d, STATUS: REPLICA BUILD, PARALLELISM: %ld, ROW_SCANNED: %ld, ROW_SORTED: %ld, ROW_INSERTED_INTO_TMP_FILE: %ld, ROW_INSERTED: %ld out of %ld column group rows|
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
狀態定義
在新的方案中,我們統計了更加詳細的信息,但是一次展示過于復雜。因此分階段展示信息。考慮到多線程場景,不同線程執行速度可能不一致,因此考慮展示最慢線程所處階段的信息。階段定義如下。同時,無論處于哪個階段,還是會輸出總掃描行數,總排序行數,總插入行數這些整體信息。
enum class RedefinitionState
{BEFORESCAN = 0,SCAN,SORT-PHASE1,SORT-PHASE2,INSERT
};
排序階段分為 內存排序和外部排序,兩個需要分別展示,但是為了方便用戶理解,不暴露實現細節,使用 SORT-PHASE1 和 SORT-PHASE2 分別代表兩個階段。
BEFORESCAN
掃描開始前的階段,可能位于采樣等環節。
"TENANT_ID: %d, TASK_ID:%d STATUS: REPLICA BUILD, BEFORE-SCAN"
SCAN
掃描階段。因為還無法獲取總行數,所以掃描進度和預計用時無法計算,只能輸出已經掃描的總行數。 同時還會輸出掃描的并行度和 掃描已經花費的時間,另外還會輸出所有線程中掃描最多和最少的行數
TENANT_ID: %d, TASK_ID:%d STATUS: REPLICA BUILD, SCANNING, PARALLELISM: %ld,
ROW_ROUNT_INFO:{ ROW_SCANNED: %ld, ROW_SORTED: %ld, ROW_INSERTED: %ld },
SCAN_INFO:{ SCAN_TIME_ELAPSED: %.3fs, MAX_THREAD_ROW_SCANNED: %ld, MIN_THREAD_ROW_SCANNED: %ld }
示例
TENANT_ID: 1004, TASK_ID: 1000817, STATUS: REPLICA BUILD, SCANNING, PARALLELISM: 2,
ROW_COUNT_INFO:{ ROW_SCANNED: 30798887, ROW_SORTED: 0, ROW_INSERTED: 0 },
SCAN_INFO:{ SCAN_TIME_ELAPSED: 86.529s, MAX_THREAD_ROW_SCANNED: 15428902, MIN_THREAD_ROW_SCANNED: 15369985 }第一行是整個ddl_task的整體信息,執行的租戶id為1004,task_id為 1000817,處于掃描階段,掃描并行度為2第二行是掃描行數,總排序行數,總插入行數信息。第三行是為掃描階段具體信息,當前掃描已花費時間為 86.529s ,最多的線程掃了 15428902 行, 最少的線程掃了 15369985 行
SORT-PHASE1
內存排序階段。從此階段開始,不同線程之間進度獨立。輸出的階段和進度按照最慢統計。 為了不產生誤導PARALLELISM仍然展示所有的工作線程數(inmem_sort+merge_sort+insert),另外單獨輸出位于該階段的線程數。
輸出格式為
TENANT_ID: 1, TASK_ID: 3051020, STATUS: REPLICA BUILD, SORT_PHASE1, PARALLELISM: %ld, SORT_PHASE1_THREAD_NUM: %ld,
ROW_COUNT_INFO:{ ROW_SCANNED: %ld, ROW_SORTED: %ld, ROW_INSERTED: %ld },
SORT_PHASE1_PROGRESS_INFO:{ SORT_PHASE1_TIME_ELAPSED: %.3fs, SORT_PHASE1_PROGRESS: %.2f%%, SORT_PHASE1_TIME_REMAINING: %.3fs },
SLOWEST_THREAD_INFO:{ THREAD_ID: %ld, SORTED_ROW_COUNT: %ld }
示例
TENANT_ID: 1004, TASK_ID: 1000817, STATUS: REPLICA BUILD, SORT_PHASE1, PARALLELISM: 2, SORT_PHASE1_THREAD_NUM: 2,
ROW_COUNT_INFO:{ ROW_SCANNED: 32000000, ROW_SORTED: 30917952, ROW_INSERTED: 0 },
SORT_PHASE1_PROGRESS_INFO:{ SORT_PHASE1_TIME_ELAPSED: 137.501s, SORT_PHASE1_PROGRESS: 96.56%, SORT_PHASE1_TIME_REMAINING: 4.892s },
SLOWEST_THREAD_INFO:{ THREAD_ID: 2807314, SORTED_ROW_COUNT: 15458976 }第一行是整個ddl_task的整體信息,執行的租戶id為1004,task_id為 1000817,
當前階段為REPLICA BUILD的INMEM_SORT階段。當前并行度為2第二行總掃描行數,總排序行數,總插入行數信息的整體信息,第三行為階段的時間和進度信息,當前階段已進行137.501s, 當前階段進度為96.56%,預計還需要4.892s完成。第四行是最慢線程信息, 最慢線程的線程id為 2807314, 最慢線程處理了 15458976 行,
SORT-PHASE2
外部排序階段,前面和內存排序一樣, 但是因為涉及到落盤操作,會額外輸出 落盤io量和臨時文件壓縮類型等信息
TENANT_ID: 1004, TASK_ID: 1000817, STATUS: REPLICA BUILD, SORT_PHASE2, PARALLELISM: %ld, SORT_PHASE2_THREAD_NUM: %ld,
ROW_COUNT_INFO:{ ROW_SCANNED: %ld, ROW_SORTED: %ld, ROW_INSERTED: %ld },
SORT_PHASE2_PROGRESS_INFO:{ SORT_PHASE2_TIME_ELAPSED: %.3fs, SORT_PHASE2_PROGRESS: %.2f%%, SORT_PHASE2_TIME_REMAINING: %.3fs },
SLOWEST_THREAD_INFO:{ THREAD_ID: %ld, SORTRD_ROW_COUNT: %ld },
TEMP_FILE_INFO:{ DUMP_SIZE: %ld, COMPRESS_TYPE: %s }
示例
TENANT_ID: 1004, TASK_ID: 1000817, STATUS: REPLICA BUILD, SORT_PHASE2, PARALLELISM: 2, SORT_PHASE2_THREAD_NUM: 1,
ROW_COUNT_INFO:{ ROW_SCANNED: 32000000, ROW_SORTED: 121040340, ROW_INSERTED: 44300 },
SORT_PHASE2_PROGRESS_INFO:{ SORT_PHASE2_TIME_ELAPSED: 35.277s, SORT_PHASE2_PROGRESS: 85.49%, SORT_PHASE2_TIME_REMAINING: 5.986s },
SLOWEST_THREAD_INFO:{ THREAD_ID: 2807315, SORTRD_ROW_COUNT: 41013268 },
TEMP_FILE_INFO:{ DUMP_SIZE: 366975583, COMPRESS_TYPE: zstd_1.3.8 }第一行是整個ddl_task的整體信息,執行的租戶id為1004 ,task_id為 1000817,
當前階段為REPLICA BUILD的SORT-PHASE2階段。當前并行度為2,但是只有一個線程處于排序第二階段,說明其完成排序進入插入階段。第二行是總掃描行數,總排序行數,總插入行數信息。第三行是當前階段時間和進度信息。當前階段已進行35.277s,當前階段進度為85.49%,預計剩余5.986s完成。第四行是最慢線程信息,最慢線程的線程id為 2807315, 處理了 41013268行第五行是臨時文件存儲有關信息,當前總落盤量為 366975583, 臨時文件采用的壓縮算法為 zstd_1.3.8
insert
插入階段,對于column group > 1的表,與原版本保持一致。對于column group為1的表,新行為和sort相似
TENANT_ID: 1004, TASK_ID: 1000817, STATUS: REPLICA BUILD, INSERT, PARALLELISM: %ld, INSERT_THREAD: %ld,
ROW_COUNT_INFO:{ ROW_SCANNED: %ld, ROW_SORTED: %ld, ROW_INSERTED: %ld },
INSERT_PROGRESS_INFO:{ INSERT_TIME_ELAPSED: %.3fs, INSERT_PROGRESS: %.2f%%, INSERT_TIME_REMAINING: %.3fs },
SLOWEST_THREAD_INFO:{ THREAD_ID: %ld, INSERTED_ROW_COUNT: %ld }
示例
TENANT_ID: 1004, TASK_ID: 1000817, STATUS: REPLICA BUILD, INSERT, PARALLELISM: 2, INSERT_THREAD: 2,
ROW_COUNT_INFO:{ ROW_SCANNED: 32000000, ROW_SORTED: 128000000, ROW_INSERTED: 28806480 },
INSERT_PROGRESS_INFO:{ INSERT_TIME_ELAPSED: 121.941s, INSERT_PROGRESS: 86.61%, INSERT_TIME_REMAINING: 18.845s },
SLOWEST_THREAD_INFO:{ THREAD_ID: 2807315, INSERTED_ROW_COUNT: 13850544 }第一行是整體信息, 租戶ID 1004, task_id 1000817,當前階段為REPLICA BUILD的insert階段。當前并行度為2第二行是總掃描行數,總排序行數,總插入行數信息。第三行是當前階段時間和進度信息。當前階段已進行121.941s,插入進度為57.87%,預計 還有16.725s能完成插入階段。第四行是最慢線程信息。最慢線程的線程id為2236847, 處理了3188860行
局部索引的特殊處理
局部索引是存在優化,可能會串行分批發送inner_sql。在這種情況下,為了不產生混淆。針對局部索引。會在輸出tenant_id后task_id后添加build local index batch num: %ld的信息。
//示例
TENANT_ID: 1, TASK_ID: 3051020, build local index batch num: 2, STATUS: REPLICA BUILD, SORT_PHASE2, PARALLELISM: %ld, SORT_PHASE2_THREAD_NUM: %ld,
ROW_COUNT_INFO:{ROW_SCANNED: %ld, ROW_SORTED: %ld, ROW_INSERTED: %ld },
SORT_PHASE2_PROGRESS_INFO:{ SORT_PHASE2_TIME_ELAPSED: %.3fs, SORT_PHASE2_PROGRESS: %.2f%%, SORT_PHASE2_TIME_REMAINING: %.3fs },
SLOWEST_THREAD_INFO:{ THREAD_ID: %ld, SORTRD_ROW_COUNT: %ld },
TEMP_FILE_INFO:{ DUMP_SIZE: %ld, COMPRESS_TYPE: %s }build local index batch num: 2 表示目前是局部索引的第二批執行。其他語義不變。
新增 __all_virtual_ddl_diagnose_info 虛擬表
各字段說明
Field | Type | Null | Key | Default | Extra |
---|---|---|---|---|---|
tenant_id | bigint(20) | NO | PK | 租戶ID | |
ddl_task_id | bigint(20) | NO | PK | 任務ID | |
object_table_id | bigint(20) | NO | PK | 目標表ID | |
op_type | varchar(64) | NO | PK | DDL操作類型 | |
create_time | timestamp | NO | PK | DDL開始時間 | |
finish_time | timestamp | NULL | DDL結束時間 | ||
diagnose_info | varchar(1024) | NULL | DDL執行診斷信息 |
目前虛擬表會輸出正在進行的DDL和最近完成的100條DDL的信息。因為只能對走sql 路徑的DDL進行診斷。類型如下
DDL_CREATE_INDEX
DDL_CREATE_PARTITIONED_LOCAL_INDEX
DDL_MODIFY_COLUMN
DDL_ADD_PRIMARY_KEY
DDL_ALTER_PRIMARY_KEY
DDL_ALTER_PARTITION_BY
DDL_TABLE_REDEFINITION
診斷信息輸出
還未開始掃描的DDL
對于正在進行的DDL, 在掃描開始前無統計信息,此時message為空。
+-----------+-------------+-----------------+------------------------------------+----------------------------+----------------------------+---------------+
| tenant_id | ddl_task_id | object_table_id | opname | create_time | finish_time | diagnose_info |
+-----------+-------------+-----------------+------------------------------------+----------------------------+----------------------------+---------------+
| 1004 | 2079489 | 500005 | DDL_CREATE_PARTITIONED_LOCAL_INDEX | 2024-12-30 19:21:13.619570 | 2024-12-30 19:21:14.297205 | |
+-----------+-------------+-----------------+------------------------------------+----------------------------+----------------------------+---------------+
還未開始排序的DDL
在掃描階段無法感知異常信息。在這個階段因此僅輸出 “Scanning” 信息。
+-----------+-------------+-----------------+------------------------------------+----------------------------+----------------------------+---------------+
| tenant_id | ddl_task_id | object_table_id | opname | create_time | finish_time | diagnose_info |
+-----------+-------------+-----------------+------------------------------------+----------------------------+----------------------------+---------------+
| 1004 | 2079489 | 500005 | DDL_CREATE_PARTITIONED_LOCAL_INDEX | 2024-12-30 19:21:13.619570 | 2024-12-30 19:21:16.706037 | Scanning |
+-----------+-------------+-----------------+------------------------------------+----------------------------+----------------------------+---------------+
空算子DDL
沒有找到對應的sql_monitor_node或者算子總行數為0的情況。此時不進行診斷。不會出現在虛擬表中。
開始排序的DDL
THREAD_INFO: { parallel_num : %ld, row_max: %ld, row_max_thread_id: %ld, row_min: %ld, row_min_thread_id: %ld }
展示并行度,還有行數最多和最少的線程信息。
+-----------+-------------+-----------------+------------------------------------+----------------------------+----------------------------+-------------------------------------------------------------------------------------------------------------------------------+
| tenant_id | ddl_task_id | object_table_id | opname | create_time | finish_time | diagnose_info |
+-----------+-------------+-----------------+------------------------------------+----------------------------+----------------------------+-------------------------------------------------------------------------------------------------------------------------------+
| 1004 | 2079489 | 500005 | DDL_CREATE_PARTITIONED_LOCAL_INDEX | 2024-12-30 15:35:37.995584 | 2024-12-30 19:21:16.706037 | THREAD_INFO: { parallel_num : 5, row_max: 6512288, row_max_thread_id: 3025919, row_min: 6243328, row_min_thread_id: 3025940 } |
+-----------+-------------+-----------------+------------------------------------+----------------------------+----------------------------+-------------------------------------------------------------------------------------------------------------------------------+
執行完成的DDL
對于已經執行成功的DDL, 會輸出以下信息
THREAD_INFO: { parallel_num : %ld, row_max: %ld, row_max_thread_id: %ld, row_min: %ld, row_min_thread_id: %ld slowest_thread_id: %ld },
TIME_INFO: { scan_time: %.3fs, sort_time: %.3fs, insert_time: %.3fs }
除了 并行度, 還有行數最多和最少的線程信息,另外還有執行最慢的線程id。
另外,還統計了各個階段花費的時間信息。
+-----------+-------------+-----------------+------------------------------------+----------------------------+----------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| tenant_id | ddl_task_id | object_table_id | opname | create_time | finish_time | diagnose_info |
+-----------+-------------+-----------------+------------------------------------+----------------------------+----------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 1004 | 2069349 | 500907 | DDL_CREATE_PARTITIONED_LOCAL_INDEX | 2024-12-30 19:16:28.035185 | 2024-12-30 19:19:39.831287 | THREAD_INFO: { parallel_num : 5, row_max: 6512288, row_max_thread_id: 3025937, row_min: 6243328, row_min_thread_id: 3025940 slowest_thread_id: 3025936 }, TIME_INFO: { scan_time: 50.350s, sort_time: 70.715s, insert_time: 61.291s } |
+-----------+-------------+-----------------+------------------------------------+----------------------------+----------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
發生過重試的DDL
發生重試的DDL ,會在最前面輸出嘗試次數信息
正在進行的DDL輸出 當前次數,已經完成的DDL輸出最高次數
+-----------+-------------+-----------------+------------------------------------+----------------------------+----------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| tenant_id | ddl_task_id | object_table_id | opname | create_time | finish_time | diagnose_info |
+-----------+-------------+-----------------+------------------------------------+----------------------------+----------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 1004 | 2069349 | 500907 | DDL_CREATE_PARTITIONED_LOCAL_INDEX | 2024-12-30 19:16:28.035185 | 2024-12-30 19:19:39.831287 | try count: 2, THREAD_INFO: { parallel_num : 5, row_max: 6512288, row_max_thread_id: 3025937, row_min: 6243328, row_min_thread_id: 3025940 slowest_thread_id: 3025936 }, TIME_INFO: { scan_time: 50.350s, sort_time: 70.715s, insert_time: 61.291s } |
+-----------+-------------+-----------------+------------------------------------+----------------------------+----------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
異常診斷信息輸出
分析診斷結果,如果存在異常場景,會輸出診斷提醒。
數據傾斜:
當行數最多的線程和最低的線程相差大于 100%時,會認為存在數據傾斜,會在上面診斷信息的基礎上添加以下信息。
The data skew is significant, with a low sampling rate or uneven sampling.
+-----------+-------------+-----------------+------------------------------------+----------------------------+----------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| tenant_id | ddl_task_id | object_table_id | opname | create_time | finish_time | diagnose_info |
+-----------+-------------+-----------------+------------------------------------+----------------------------+----------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 1004 | 4000296 | 501821 | DDL_CREATE_PARTITIONED_LOCAL_INDEX | 2024-12-31 09:47:39.521994 | 2024-12-31 09:50:25.560664 | THREAD_INFO: { parallel_num : 5, row_max: 6512288, row_max_thread_id: 3913304, row_min: 6243328, row_min_thread_id: 3913307 slowest_thread_id: 3913304 }, TIME_INFO: { scan_time: 26.608s, sort_time: 68.728s, insert_time: 60.114s }, DIAGNOSE_CASE: { The data skew is significant, with a low sampling rate or uneven sampling. }|
+-----------+-------------+-----------------+------------------------------------+----------------------------+----------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
并行度低于預期
當實際處理數據的的算子數低于并行度時,會輸出以下信息
The number of threads with data is less than the dop. real_parallelism: %ld
+-----------+-------------+-----------------+------------------------------------+----------------------------+----------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| tenant_id | ddl_task_id | object_table_id | opname | create_time | finish_time | diagnose_info |
+-----------+-------------+-----------------+------------------------------------+----------------------------+----------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 1004 | 3046793 | 501563 | DDL_CREATE_INDEX | 2024-12-30 20:59:49.831452 | 2024-12-30 20:59:50.755917 | THREAD_INFO: { parallel_num : 5, row_max: 1, row_max_thread_id: 3150398, row_min: 1, row_min_thread_id: 3150398 slowest_thread_id: 3150399 }, TIME_INFO: { scan_time: 0.030s, sort_time: 0.002s, insert_time: 0.051s }, DIAGNOSE_CASE: { The number of threads with data is less than the dop. real_parallelism: 4. }|
+-----------+-------------+-----------------+------------------------------------+----------------------------+----------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
局部索引的特殊處理
對于局部索引分批執行的場景。不輸出各階段的執行時間,同時也不進行數據傾斜度的計算。只診斷并行度低于預期的場景。同時會輸出批次信息。正在進行的DDL輸出當前處理的批次, 完成的DDL輸出總批次。
+-----------+-------------+-----------------+------------------------------------+----------------------------+----------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| tenant_id | ddl_task_id | object_table_id | opname | create_time | finish_time | diagnose_info |
+-----------+-------------+-----------------+------------------------------------+----------------------------+----------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 1004 | 2079489 | 500005 | DDL_CREATE_PARTITIONED_LOCAL_INDEX | 2024-12-30 15:35:37.995584 | 2024-12-30 19:21:16.706037 | build local index batch num: 3, THREAD_INFO: { parallel_num : 5, row_max: 6512288, row_max_thread_id: 3025919, row_min: 6243328, row_min_thread_id: 3025940 } |
+-----------+-------------+-----------------+------------------------------------+----------------------------+----------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+