RoboMaster- RDK X5能量機關實現案例（一）識別

作者：SkyXZ

CSDN：https://blog.csdn.net/xiongqi123123

博客園：https://www.cnblogs.com/SkyXZ

????????在RoboMaster的25賽季，我主要負責了能量機關的視覺方案開發，目前整體算法已經搭建完成，實際方案上我使用的上位機是Jetson Orin NX 16GB，其具備100TOPS的算力，在經過TensorRT優化部署后，實現了1080P原始圖像從識別到PNP解算再到得到預測結果的每幀耗時僅為2.5ms，由于算法部分已經完成，且正值寒假，我準備利用手頭的RDK X5進行能量機關的識別的算法驗證，試試RDK X5作為RoboMaster的上位機可不可行

一、訓練模型

????????我采用的模型是Yolov8n-Pose，數據集我們使用的是西交利物浦GMaster戰隊開源的數據集：zRzRzRzRzRzRzR/YOLO-of-RoboMaster-Keypoints-Detection-2023: 2023年西交利物浦大學動云科技GMaster戰隊yolo 裝甲板四點模型，能量機關五點模型，區域賽視覺識別板目標檢測其標注格式為4類別5點，具體介紹如下：

????????訓練部分沒什么好說的，配置好環境使用如下數據配置及命令進行訓練即可：

# buff.yaml
path: buff_format
train: train
val: test
kpt_shape: [5, 2]
names:0: RR1: RW2: BR3: BW

yolo pose train data=buff.yaml model=yolov8n-pose.pt epochs=200 batch=32 imgsz=640 iou=0.7 max_det=10 kobj=10 rect=True name=buff

二、量化模型

????????完成了模型的訓練便到了我們最關鍵的一步模型量化啦，我們首先需要修改模型的輸出頭使得三個特征層的Bounding Box信息和Classify信息分開輸出，具體而言，我們找到Yolov8的源碼中的./ultralytics/ultralytics/nn/modules/head.py文件，接著在大約第64行的地方用如下代碼替代Detect類的forward方法：

def forward(self, x):  # Detectresult = []for i in range(self.nl):result.append(self.cv2[i](x[i]).permute(0, 2, 3, 1).contiguous())result.append(self.cv3[i](x[i]).permute(0, 2, 3, 1).contiguous())return result

????????然后繼續用如下代碼來替換大約第242行的Pose類的forward方法：

def forward(self, x):detect_results = Detect.forward(self, x)kpt_results = []for i in range(self.nl):kpt_results.append(self.cv4[i](x[i]).permute(0, 2, 3, 1).contiguous())return (detect_results, kpt_results)

????????修改完上述部分后我們便可以使用如下命令來導出ONNX模型啦：

yolo export model=/path/to/your/model format=onnx simplify=True opset=11 imgsz=640
# 注意，如果生成的onnx模型顯示ir版本過高，可以將simplify=False

????????然后我們進入地瓜的RDK算法工具鏈的Docker鏡像（具體安裝配置可見我的另外一篇Blogs：學弟一看就會的RDKX5模型轉換及部署，你確定不學？），使用如下命令對我們的ONNX進行驗證，之后終端便會打印出我們這個模型的基本信息、結構信息以及算子信息

hb_mapper checker --model-type onnx --march bayes-e --model /path/to/your/model.onnx

????????我們根據如下的打印信息可以知道我們這個模型的所有算子均可以放到BPU上

????????接著我們便可以開始配置我們的模型量化配置文件啦，我們開啟了calibration_parameters校準數據類的preprocess_on功能來開啟圖片校準樣本自動處理，大家只需要修改onnx_model模型路徑和cal_data_dir校準圖片地址即可

model_parameters:onnx_model: 'buff_dim3.onnx'march: "bayes-e"layer_out_dump: Falseworking_dir: 'buff'output_model_file_prefix: 'buff_dim3'input_parameters:input_name: ""input_type_rt: 'nv12'input_type_train: 'rgb'input_layout_train: 'NCHW'norm_type: 'data_scale'scale_value: 0.003921568627451calibration_parameters:cal_data_dir: './buff_format'cal_data_type: 'float32'preprocess_on: Truecompiler_parameters:compile_mode: 'latency'debug: Falseoptimize_level: 'O3'

????????接著我們使用如下命令即可開始量化我們的ONNX模型為RDK所支持的Bin模型，過程有些小慢，如果沒有紅色報錯的話安心等待即可

hb_mapper makertbin --model-type onnx --config /path/to/your/yaml

????????在模型的轉換過程中我們查看日志可以找到大小為[1, 80, 80, 64], [1, 40, 40, 64], [1, 20, 20, 64]的三個輸出的名稱分別為output0, 352, 368

????????由于反量化操作會將int8的量化數據轉換回float32格式消耗額外的計算資源和時間，因此我們需要移除反量化節點可以減少不必要的計算開銷，提高模型推理速度，我們使用如下命令查看可以被移除的反量化節點

hb_model_modifier /path/to/your/convert_model.bin

????????我們打開生成的/open_explorer/Model/Buff/hb_model_modifier.log日志，這里面有詳細的節點說明，我們根據之前找到的三個名稱output0、352、368，可以找到其詳細信息

??????我們使用如下命令移除上述反量化節點：(請根據自己的模型進行修改)

hb_model_modifier /path/to/your/convert_model.bin \
-r /model.22/cv2.0/cv2.0.2/Conv_output_0_HzDequantize \
-r /model.22/cv2.1/cv2.1.2/Conv_output_0_HzDequantize \
-r /model.22/cv2.2/cv2.2.2/Conv_output_0_HzDequantize

????????最后我們便得到了摘掉反量化節點的最終模型buff_dim3_modified.bin，這個模型便可以直接用于部署啦，但是在實際部署之前我們使用可視化命令對其進行檢查：

hb_perf /path/to/your/convert_model.bin

buff_dim3_modified

????????以及檢查模型的輸入輸出：

hrt_model_exec model_info --model_file /path/to/your/convert_model.bin

三、模型部署

????????在得到摘掉反量化節點的最終模型buff_dim3_modified.bin后我們便可以進行部署啦，我們首先先對X5板卡進行超頻，使之CPU和BPU均處于最佳狀態，具體超頻命令如下：

sudo bash -c "echo 1 > /sys/devices/system/cpu/cpufreq/boost"  # CPU: 1.8Ghz
sudo bash -c "echo performance > /sys/devices/system/cpu/cpufreq/policy0/scaling_governor" # Performance Mode
echo 1200000000 > /sys/kernel/debug/clk/bpu_mclk_2x_clk/clk_rate # BPU: 1.2GHz

????????Yolov8-Pose的部署流程和Yolov5-Detect的總體上沒有太大差別，具體部署的教程可以查看我的另一篇Blogs：學弟一看就會的RDKX5模型轉換及部署，你確定不學？，但是我們需要在此基礎上主要修改我們的特征圖輔助處理函數ProcessFeatureMap()、GetModelInfo()模型信息檢查函數中的輸出順序以及繪圖函數中的DrawResults()中的關鍵點可視化部分，我們運行代碼可以看到在使用普通USB相機的情況下CPU占用率很低，BPU的負載也較低，并且在沒有開多線程對推理優化的情況下能跑滿我手上這個攝像頭的幀率到80幀

// 標準C++庫
#include <iostream>     // 輸入輸出流
#include <vector>      // 向量容器
#include <algorithm>   // 算法庫
#include <chrono>      // 時間相關功能
#include <iomanip>     // 輸入輸出格式控制// OpenCV庫
#include <opencv2/opencv.hpp>      // OpenCV主要頭文件
#include <opencv2/dnn/dnn.hpp>     // OpenCV深度學習模塊// 地平線RDK BPU API
#include "dnn/hb_dnn.h"           // BPU基礎功能
#include "dnn/hb_dnn_ext.h"       // BPU擴展功能
#include "dnn/plugin/hb_dnn_layer.h"    // BPU層定義
#include "dnn/plugin/hb_dnn_plugin.h"   // BPU插件
#include "dnn/hb_sys.h"           // BPU系統功能// 錯誤檢查宏定義
#define RDK_CHECK_SUCCESS(value, errmsg)                        \do                                                          \{                                                          \auto ret_code = value;                                  \if (ret_code != 0)                                      \{                                                       \std::cout << errmsg << ", error code:" << ret_code; \return ret_code;                                    \}                                                       \} while (0);// 模型和檢測相關的默認參數定義
#define DEFAULT_MODEL_PATH "/root/Deep_Learning/YOLOV8-Pose/models/buff_dim3_modified.bin"  // 默認模型路徑
#define DEFAULT_CLASSES_NUM 4          // 默認類別數量
#define CLASSES_LIST "RR","RW","BR","BW"     // 類別名稱
#define KPT_NUM 5 // num of kpt
#define KPT_ENCODE 3 // kpt 的編碼，2:x,y, 3:x,y,vis
#define KPT_SCORE_THRESHOLD 0.5 // kpt 分數閾值, 默認0.25
#define REG 16 // 控制回歸部分離散化程度的超參數, 默認16
#define DEFAULT_NMS_THRESHOLD 0.45f    // 非極大值抑制閾值
#define DEFAULT_SCORE_THRESHOLD 0.25f  // 置信度閾值
#define DEFAULT_NMS_TOP_K 300          // NMS保留的最大框數
#define DEFAULT_FONT_SIZE 1.0f         // 繪制文字大小
#define DEFAULT_FONT_THICKNESS 1.0f    // 繪制文字粗細
#define DEFAULT_LINE_SIZE 2.0f         // 繪制線條粗細// 運行模式選擇
#define DETECT_MODE 0    // 檢測模式: 0-單張圖片, 1-實時檢測
#define ENABLE_DRAW 1    // 繪圖開關: 0-禁用, 1-啟用
#define LOAD_FROM_DDR 0  // 模型加載方式: 0-從文件加載, 1-從內存加載// 特征圖尺度定義 (基于輸入尺寸的倍數關系)
#define H_8 (input_h_ / 8)    // 輸入高度的1/8
#define W_8 (input_w_ / 8)    // 輸入寬度的1/8
#define H_16 (input_h_ / 16)  // 輸入高度的1/16
#define W_16 (input_w_ / 16)  // 輸入寬度的1/16
#define H_32 (input_h_ / 32)  // 輸入高度的1/32
#define W_32 (input_w_ / 32)  // 輸入寬度的1/32// BPU目標檢測類
class BPU_Detect {
public:// 構造函數：初始化檢測器的參數// @param model_path: 模型文件路徑// @param classes_num: 檢測類別數量// @param nms_threshold: NMS閾值// @param score_threshold: 置信度閾值// @param nms_top_k: NMS保留的最大框數BPU_Detect(const std::string& model_path = DEFAULT_MODEL_PATH,int classes_num = DEFAULT_CLASSES_NUM,float nms_threshold = DEFAULT_NMS_THRESHOLD,float score_threshold = DEFAULT_SCORE_THRESHOLD,int nms_top_k = DEFAULT_NMS_TOP_K);// 析構函數：釋放資源~BPU_Detect();// 主要功能接口bool Init();  // 初始化BPU和模型bool Detect(const cv::Mat& input_img, cv::Mat& output_img);  // 執行目標檢測bool Release();  // 釋放所有資源private:// 內部工具函數bool LoadModel();  // 加載模型文件bool GetModelInfo();  // 獲取模型的輸入輸出信息bool PreProcess(const cv::Mat& input_img);  // 圖像預處理（resize和格式轉換）bool Inference();  // 執行模型推理bool PostProcess();  // 后處理（NMS等）void DrawResults(cv::Mat& img);  // 在圖像上繪制檢測結果void PrintResults() const;  // 打印檢測結果到控制臺// 特征圖處理輔助函數// @param output_tensor: 輸出tensor// @param height, width: 特征圖尺寸// @param anchors: 對應尺度的anchor boxes// @param conf_thres_raw: 原始置信度閾值void ProcessFeatureMap(hbDNNTensor& output_tensor_REG, hbDNNTensor& output_tensor_CLA,hbDNNTensor& output_tensor_KPT,int height, int width,const std::vector<std::pair<double, double>>& anchors,float conf_thres_raw);// 成員變量（按照構造函數初始化順序排列）std::string model_path_;      // 模型文件路徑int classes_num_;             // 類別數量float nms_threshold_;         // NMS閾值float score_threshold_;       // 置信度閾值int nms_top_k_;              // NMS保留的最大框數bool is_initialized_;         // 初始化狀態標志float font_size_;            // 繪制文字大小float font_thickness_;       // 繪制文字粗細float line_size_;            // 繪制線條粗細// BPU相關變量hbPackedDNNHandle_t packed_dnn_handle_;  // 打包模型句柄hbDNNHandle_t dnn_handle_;               // 模型句柄const char* model_name_;                 // 模型名稱// 輸入輸出張量hbDNNTensor input_tensor_;               // 輸入tensorhbDNNTensor* output_tensors_;            // 輸出tensor數組hbDNNTensorProperties input_properties_; // 輸入tensor屬性// 任務相關hbDNNTaskHandle_t task_handle_;          // 推理任務句柄// 模型輸入參數int input_h_;                            // 輸入高度int input_w_;                            // 輸入寬度// 檢測結果存儲std::vector<std::vector<cv::Rect2d>> bboxes_;  // 每個類別的邊界框std::vector<std::vector<float>> scores_;       // 每個類別的得分std::vector<std::vector<int>> indices_;        // NMS后的索引std::vector<std::vector<cv::Point2f>> kpts_xy_;std::vector<std::vector<float>> kpts_score_;// 圖像處理參數float x_scale_;                          // X方向縮放比例float y_scale_;                          // Y方向縮放比例int x_shift_;                            // X方向偏移量int y_shift_;                            // Y方向偏移量cv::Mat resized_img_;                    // 縮放后的圖像float conf_thres_raw_;float kpt_conf_thres_raw_;// YOLOv5 anchors信息std::vector<std::pair<double, double>> s_anchors_;  // 小目標anchorsstd::vector<std::pair<double, double>> m_anchors_;  // 中目標anchorsstd::vector<std::pair<double, double>> l_anchors_;  // 大目標anchors// 輸出處理int output_order_[9] = {0, 1, 2, 3, 4, 5, 6, 7, 8};                    // 輸出順序映射std::vector<std::string> class_names_;   // 類別名稱列表
};// 構造函數實現
BPU_Detect::BPU_Detect(const std::string& model_path,int classes_num,float nms_threshold,float score_threshold,int nms_top_k): model_path_(model_path),classes_num_(classes_num),nms_threshold_(nms_threshold),score_threshold_(score_threshold),nms_top_k_(nms_top_k),is_initialized_(false),font_size_(DEFAULT_FONT_SIZE),font_thickness_(DEFAULT_FONT_THICKNESS),line_size_(DEFAULT_LINE_SIZE),packed_dnn_handle_(nullptr),dnn_handle_(nullptr),task_handle_(nullptr),output_tensors_(nullptr) {// 初始化類別名稱class_names_ = {CLASSES_LIST};// 初始化anchorsstd::vector<float> anchors = {10.0, 13.0, 16.0, 30.0, 33.0, 23.0, 30.0, 61.0, 62.0, 45.0, 59.0, 119.0, 116.0, 90.0, 156.0, 198.0, 373.0, 326.0};// 設置small, medium, large anchorsfor(int i = 0; i < 3; i++) {s_anchors_.push_back({anchors[i*2], anchors[i*2+1]});m_anchors_.push_back({anchors[i*2+6], anchors[i*2+7]});l_anchors_.push_back({anchors[i*2+12], anchors[i*2+13]});}
}// 析構函數實現
BPU_Detect::~BPU_Detect() {if(is_initialized_) {Release();}
}// 初始化函數實現
bool BPU_Detect::Init() {if(is_initialized_) {std::cout << "Already initialized!" << std::endl;return true;}auto init_start = std::chrono::high_resolution_clock::now();if(!LoadModel()) {std::cout << "Failed to load model!" << std::endl;return false;}if(!GetModelInfo()) {std::cout << "Failed to get model info!" << std::endl;return false;}is_initialized_ = true;auto init_end = std::chrono::high_resolution_clock::now();float init_time = std::chrono::duration_cast<std::chrono::microseconds>(init_end - init_start).count() / 1000.0f;std::cout << "\n============ Model Loading Time ============" << std::endl;std::cout << "Total init time: " << std::fixed << std::setprecision(2) << init_time << " ms" << std::endl;std::cout << "=========================================\n" << std::endl;return true;
}// 加載模型實現
bool BPU_Detect::LoadModel() {// 記錄總加載時間的起點auto load_start = std::chrono::high_resolution_clock::now();#if LOAD_FROM_DDR// 用于記錄從文件讀取模型數據的時間float read_time = 0.0f;
#endif// 用于記錄模型初始化的時間float init_time = 0.0f;#if LOAD_FROM_DDR// =============== 從文件讀取模型到內存 ===============auto read_start = std::chrono::high_resolution_clock::now();// 打開模型文件FILE* fp = fopen(model_path_.c_str(), "rb");if (!fp) {std::cout << "Failed to open model file: " << model_path_ << std::endl;return false;}// 獲取文件大小:fseek(fp, 0, SEEK_END);// 1. 將文件指針移到末尾size_t model_size = static_cast<size_t>(ftell(fp));// 2. 獲取當前位置(即文件大小)fseek(fp, 0, SEEK_SET);// 3. 將文件指針重置到開頭// 為模型數據分配內存void* model_data = malloc(model_size);if (!model_data) {std::cout << "Failed to allocate memory for model data" << std::endl;fclose(fp);return false;}// 讀取模型數據到內存size_t read_size = fread(model_data, 1, model_size, fp);fclose(fp);// 計算文件讀取時間auto read_end = std::chrono::high_resolution_clock::now();read_time = std::chrono::duration_cast<std::chrono::microseconds>(read_end - read_start).count() / 1000.0f;// 驗證是否完整讀取了文件if (read_size != model_size) {std::cout << "Failed to read model data, expected " << model_size << " bytes, but got " << read_size << " bytes" << std::endl;free(model_data);return false;}// =============== 從內存初始化模型 ===============auto init_start = std::chrono::high_resolution_clock::now();// 準備模型數據數組和長度數組const void* model_data_array[] = {model_data};int32_t model_data_length[] = {static_cast<int32_t>(model_size)};// 使用BPU API從內存初始化模型RDK_CHECK_SUCCESS(hbDNNInitializeFromDDR(&packed_dnn_handle_, model_data_array, model_data_length, 1),"Initialize model from DDR failed");// 釋放臨時分配的內存free(model_data);// 計算模型初始化時間auto init_end = std::chrono::high_resolution_clock::now();init_time = std::chrono::duration_cast<std::chrono::microseconds>(init_end - init_start).count() / 1000.0f;#else// =============== 直接從文件初始化模型 ===============auto init_start = std::chrono::high_resolution_clock::now();// 獲取模型文件路徑const char* model_file_name = model_path_.c_str();// 使用BPU API從文件初始化模型RDK_CHECK_SUCCESS(hbDNNInitializeFromFiles(&packed_dnn_handle_, &model_file_name, 1),"Initialize model from file failed");// 計算模型初始化時間auto init_end = std::chrono::high_resolution_clock::now();init_time = std::chrono::duration_cast<std::chrono::microseconds>(init_end - init_start).count() / 1000.0f;
#endif// =============== 計算并打印總時間統計 ===============auto load_end = std::chrono::high_resolution_clock::now();float total_load_time = std::chrono::duration_cast<std::chrono::microseconds>(load_end - load_start).count() / 1000.0f;// 打印時間統計信息std::cout << "\n============ Model Loading Details ============" << std::endl;
#if LOAD_FROM_DDRstd::cout << "File reading time: " << std::fixed << std::setprecision(2) << read_time << " ms" << std::endl;
#endifstd::cout << "Model init time: " << std::fixed << std::setprecision(2) << init_time << " ms" << std::endl;std::cout << "Total loading time: " << std::fixed << std::setprecision(2) << total_load_time << " ms" << std::endl;std::cout << "===========================================\n" << std::endl;return true;
}// 獲取模型信息實現
bool BPU_Detect::GetModelInfo() {// 獲取模型名稱列表const char** model_name_list;int model_count = 0;RDK_CHECK_SUCCESS(hbDNNGetModelNameList(&model_name_list, &model_count, packed_dnn_handle_),"hbDNNGetModelNameList failed");if(model_count > 1) {std::cout << "Model count: " << model_count << std::endl;std::cout << "Please check the model count!" << std::endl;return false;}model_name_ = model_name_list[0];// 獲取模型句柄RDK_CHECK_SUCCESS(hbDNNGetModelHandle(&dnn_handle_, packed_dnn_handle_, model_name_),"hbDNNGetModelHandle failed");// 獲取輸入信息int32_t input_count = 0;RDK_CHECK_SUCCESS(hbDNNGetInputCount(&input_count, dnn_handle_),"hbDNNGetInputCount failed");RDK_CHECK_SUCCESS(hbDNNGetInputTensorProperties(&input_properties_, dnn_handle_, 0),"hbDNNGetInputTensorProperties failed");if(input_count > 1){std::cout << "模型輸入節點大于1，請檢查！" << std::endl;return false;}if(input_properties_.validShape.numDimensions == 4){std::cout << "輸入tensor類型: HB_DNN_IMG_TYPE_NV12" << std::endl;}else{std::cout << "輸入tensor類型不是HB_DNN_IMG_TYPE_NV12，請檢查！" << std::endl;return false;}if(input_properties_.tensorType == 1){std::cout << "輸入tensor數據排布: HB_DNN_LAYOUT_NCHW" << std::endl;}else{std::cout << "輸入tensor數據排布不是HB_DNN_LAYOUT_NCHW，請檢查！" << std::endl;return false;}// 獲取輸入尺寸input_h_ = input_properties_.validShape.dimensionSize[2];input_w_ = input_properties_.validShape.dimensionSize[3];if (input_properties_.validShape.numDimensions == 4){std::cout << "輸入的尺寸為: (" << input_properties_.validShape.dimensionSize[0];std::cout << ", " << input_properties_.validShape.dimensionSize[1];std::cout << ", " << input_h_;std::cout << ", " << input_w_ << ")" << std::endl;}else{std::cout << "輸入的尺寸不是(1,3,640,640)，請檢查！" << std::endl;return false;}// 獲取輸出信息并調整輸出順序int32_t output_count = 0;RDK_CHECK_SUCCESS(hbDNNGetOutputCount(&output_count, dnn_handle_),"hbDNNGetOutputCount failed");std::cout << "output_count: " << output_count << std::endl;// 分配輸出tensor內存output_tensors_ = new hbDNNTensor[output_count];// =============== 調整輸出頭順序映射 ===============// YOLOv5有3個輸出頭，分別對應3種不同尺度的特征圖// 需要確保輸出順序為: 小目標(8倍下采樣) -> 中目標(16倍下采樣) -> 大目標(32倍下采樣)// 定義期望的輸出特征圖尺寸和通道數int32_t expected_shapes[9][3] = {{H_8, W_8, 64},            // output[order[0]]: (1, H // 8,  W // 8,  64){H_8, W_8, DEFAULT_CLASSES_NUM},   // output[order[1]]: (1, H // 8,  W // 8,  CLASSES_NUM){H_16, W_16, 64},          // output[order[2]]: (1, H // 16, W // 16, 64){H_16, W_16, DEFAULT_CLASSES_NUM}, // output[order[3]]: (1, H // 16, W // 16, CLASSES_NUM){H_32, W_32, 64},          // output[order[4]]: (1, H // 32, W // 32, 64){H_32, W_32, DEFAULT_CLASSES_NUM}, // output[order[5]]: (1, H // 32, W // 32, CLASSES_NUM){H_8, W_8, KPT_NUM * KPT_ENCODE},            // output[order[6]]: (1, H // 8 , W // 8 , KPT_NUM * KPT_ENCODE){H_16, W_16, KPT_NUM * KPT_ENCODE},          // output[order[7]]: (1, H // 16, W // 16, KPT_NUM * KPT_ENCODE){H_32, W_32, KPT_NUM * KPT_ENCODE},          // output[order[8]]: (1, H // 32, W // 32, KPT_NUM * KPT_ENCODE)};// 遍歷每個期望的輸出尺度for(int i = 0; i < 9; i++) {// 遍歷實際的輸出節點for(int j = 0; j < 9; j++) {// 獲取當前輸出節點的屬性hbDNNTensorProperties output_properties;RDK_CHECK_SUCCESS(hbDNNGetOutputTensorProperties(&output_properties, dnn_handle_, j),"Get output tensor properties failed");int32_t actual_h = output_properties.validShape.dimensionSize[1];int32_t actual_w = output_properties.validShape.dimensionSize[2];int32_t actual_c = output_properties.validShape.dimensionSize[3];if(actual_h == expected_shapes[i][0] && actual_w == expected_shapes[i][1] && actual_c == expected_shapes[i][2]) {// 記錄正確的輸出順序output_order_[i] = j;break;}}}// 打印輸出順序映射信息if (output_order_[0] + output_order_[1] + output_order_[2] + output_order_[3] + output_order_[4] + output_order_[5] + output_order_[6] + output_order_[7] + output_order_[8] == 0 + 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8){std::cout << "Outputs order check SUCCESS, continue." << std::endl;std::cout << "order = {";for (int i = 0; i < 9; i++){std::cout << output_order_[i] << ", ";}std::cout << "}" << std::endl;}else{std::cout << "Outputs order check FAILED, use default" << std::endl;for (int i = 0; i < 9; i++)output_order_[i] = i;}return true;
}// 檢測函數實現
bool BPU_Detect::Detect(const cv::Mat& input_img, cv::Mat& output_img) {if(!is_initialized_) {std::cout << "Please initialize first!" << std::endl;return false;}if(input_img.empty()) {std::cout << "Input image is empty!" << std::endl;return false;}// 定義所有時間變量float preprocess_time = 0.0f;float infer_time = 0.0f;float postprocess_time = 0.0f;float draw_time = 0.0f;float total_time = 0.0f;auto total_start = std::chrono::high_resolution_clock::now();#if ENABLE_DRAWinput_img.copyTo(output_img);
#endifbool success = true;// 預處理{auto preprocess_start = std::chrono::high_resolution_clock::now();success = PreProcess(input_img);auto preprocess_end = std::chrono::high_resolution_clock::now();preprocess_time = std::chrono::duration_cast<std::chrono::microseconds>(preprocess_end - preprocess_start).count() / 1000.0f;if (!success) {std::cout << "Preprocess failed" << std::endl;goto cleanup;  }}// 推理{auto infer_start = std::chrono::high_resolution_clock::now();success = Inference();auto infer_end = std::chrono::high_resolution_clock::now();infer_time = std::chrono::duration_cast<std::chrono::microseconds>(infer_end - infer_start).count() / 1000.0f;if (!success) {std::cout << "Inference failed" << std::endl;goto cleanup;}}// 后處理{auto postprocess_start = std::chrono::high_resolution_clock::now();success = PostProcess();auto postprocess_end = std::chrono::high_resolution_clock::now();postprocess_time = std::chrono::duration_cast<std::chrono::microseconds>(postprocess_end - postprocess_start).count() / 1000.0f;if (!success) {std::cout << "Postprocess failed" << std::endl;goto cleanup;}}// 繪制結果{auto draw_start = std::chrono::high_resolution_clock::now();DrawResults(output_img);auto draw_end = std::chrono::high_resolution_clock::now();draw_time = std::chrono::duration_cast<std::chrono::microseconds>(draw_end - draw_start).count() / 1000.0f;}// 計算總時間{auto total_end = std::chrono::high_resolution_clock::now();total_time = std::chrono::duration_cast<std::chrono::microseconds>(total_end - total_start).count() / 1000.0f;}// 打印時間統計std::cout << "\n============ Time Statistics ============" << std::endl;std::cout << "Preprocess time: " << std::fixed << std::setprecision(2) << preprocess_time << " ms" << std::endl;std::cout << "Inference time: " << std::fixed << std::setprecision(2) << infer_time << " ms" << std::endl;std::cout << "Postprocess time: " << std::fixed << std::setprecision(2) << postprocess_time << " ms" << std::endl;std::cout << "Draw time: " << std::fixed << std::setprecision(2) << draw_time << " ms" << std::endl;std::cout << "Total time: " << std::fixed << std::setprecision(2) << total_time << " ms" << std::endl;std::cout << "FPS: " << std::fixed << std::setprecision(2) << 1000.0f / total_time << std::endl;std::cout << "======================================\n" << std::endl;cleanup:// 清理資源if (task_handle_) {hbDNNReleaseTask(task_handle_);task_handle_ = nullptr;}// 釋放輸入內存if(input_tensor_.sysMem[0].virAddr) {hbSysFreeMem(&(input_tensor_.sysMem[0]));input_tensor_.sysMem[0].virAddr = nullptr;}return success;
}// 預處理實現
bool BPU_Detect::PreProcess(const cv::Mat& input_img) {// 使用letterbox方式進行預處理x_scale_ = std::min(1.0f * input_h_ / input_img.rows, 1.0f * input_w_ / input_img.cols);y_scale_ = x_scale_;int new_w = input_img.cols * x_scale_;x_shift_ = (input_w_ - new_w) / 2;int x_other = input_w_ - new_w - x_shift_;int new_h = input_img.rows * y_scale_;y_shift_ = (input_h_ - new_h) / 2;int y_other = input_h_ - new_h - y_shift_;cv::resize(input_img, resized_img_, cv::Size(new_w, new_h));cv::copyMakeBorder(resized_img_, resized_img_, y_shift_, y_other, x_shift_, x_other, cv::BORDER_CONSTANT, cv::Scalar(127, 127, 127));// 轉換為NV12格式cv::Mat yuv_mat;cv::cvtColor(resized_img_, yuv_mat, cv::COLOR_BGR2YUV_I420);// 準備輸入tensorinput_tensor_.properties = input_properties_;input_tensor_.properties.validShape.dimensionSize[0] = 1;  // 設置batch size為1input_tensor_.properties.validShape.dimensionSize[1] = 3;  // 3通道input_tensor_.properties.validShape.dimensionSize[2] = input_h_;input_tensor_.properties.validShape.dimensionSize[3] = input_w_;hbSysAllocCachedMem(&input_tensor_.sysMem[0], int(3 * input_h_ * input_w_ / 2));uint8_t* yuv = yuv_mat.ptr<uint8_t>();uint8_t* ynv12 = (uint8_t*)input_tensor_.sysMem[0].virAddr;// 計算UV部分的高度和寬度，以及Y部分的大小int uv_height = input_h_ / 2;int uv_width = input_w_ / 2;int y_size = input_h_ * input_w_;// 將Y分量數據復制到輸入張量memcpy(ynv12, yuv, y_size);// 獲取NV12格式的UV分量位置uint8_t* nv12 = ynv12 + y_size;uint8_t* u_data = yuv + y_size;uint8_t* v_data = u_data + uv_height * uv_width;// 將U和V分量交替寫入NV12格式for(int i = 0; i < uv_width * uv_height; i++) {*nv12++ = *u_data++;*nv12++ = *v_data++;}// 將內存緩存清理，確保數據準備好可以供模型使用hbSysFlushMem(&input_tensor_.sysMem[0], HB_SYS_MEM_CACHE_CLEAN);// 清除緩存，確保數據同步return true;
}// 推理實現
bool BPU_Detect::Inference() {// 確保先釋放之前的任務if (task_handle_) {hbDNNReleaseTask(task_handle_);task_handle_ = nullptr;}// 初始化輸入tensor屬性input_tensor_.properties = input_properties_;input_tensor_.properties.validShape.dimensionSize[0] = 1;  // batch sizeinput_tensor_.properties.validShape.dimensionSize[1] = 3;  // channelsinput_tensor_.properties.validShape.dimensionSize[2] = input_h_;input_tensor_.properties.validShape.dimensionSize[3] = input_w_;// 獲取輸出tensor屬性并分配內存for(int i = 0; i < 9; i++) {hbDNNTensorProperties output_properties;RDK_CHECK_SUCCESS(hbDNNGetOutputTensorProperties(&output_properties, dnn_handle_, i),"Get output tensor properties failed");output_tensors_[i].properties = output_properties;// 分配內存int out_aligned_size = output_properties.alignedByteSize;RDK_CHECK_SUCCESS(hbSysAllocCachedMem(&output_tensors_[i].sysMem[0], out_aligned_size),"Allocate output memory failed");// 驗證內存分配if (!output_tensors_[i].sysMem[0].virAddr) {std::cout << "Failed to allocate memory for output tensor " << i << std::endl;return false;}}// 設置推理控制參數hbDNNInferCtrlParam infer_ctrl_param;HB_DNN_INITIALIZE_INFER_CTRL_PARAM(&infer_ctrl_param);// 執行推理int ret = hbDNNInfer(&task_handle_, &output_tensors_, &input_tensor_, dnn_handle_, &infer_ctrl_param);if (ret != 0) {std::cout << "Model inference failed with error code: " << ret << std::endl;return false;}// 等待任務完成ret = hbDNNWaitTaskDone(task_handle_, 0);if (ret != 0) {std::cout << "Wait task done failed with error code: " << ret << std::endl;return false;}return true;
}// 后處理實現
bool BPU_Detect::PostProcess() {// 清空上次的結果bboxes_.clear();scores_.clear();indices_.clear();kpts_xy_.clear();kpts_score_.clear();// 調整大小bboxes_.resize(classes_num_);scores_.resize(classes_num_);indices_.resize(classes_num_);conf_thres_raw_ = -log(1 / score_threshold_ - 1);kpt_conf_thres_raw_ = -log(1 / KPT_SCORE_THRESHOLD - 1); // kpt 利用反函數作用閾值，利用單調性篩選// 處理三個尺度的輸出ProcessFeatureMap(output_tensors_[0], output_tensors_[1], output_tensors_[6], H_8, W_8, s_anchors_, conf_thres_raw_);ProcessFeatureMap(output_tensors_[2], output_tensors_[3], output_tensors_[7], H_16, W_16, m_anchors_, conf_thres_raw_);ProcessFeatureMap(output_tensors_[4], output_tensors_[5], output_tensors_[8], H_32, W_32, l_anchors_, conf_thres_raw_);// 對每個類別進行NMSfor(int i = 0; i < classes_num_; i++) {cv::dnn::NMSBoxes(bboxes_[i], scores_[i], score_threshold_, nms_threshold_, indices_[i], 1.f, nms_top_k_);}return true;
}// 打印檢測結果實現
void BPU_Detect::PrintResults() const {// 打印檢測結果的總體信息int total_detections = 0;for(int cls_id = 0; cls_id < classes_num_; cls_id++) {total_detections += indices_[cls_id].size();}std::cout << "\n============ Detection Results ============" << std::endl;std::cout << "Total detections: " << total_detections << std::endl;for(int cls_id = 0; cls_id < classes_num_; cls_id++) {if(!indices_[cls_id].empty()) {std::cout << "\nClass: " << class_names_[cls_id] << std::endl;std::cout << "Number of detections: " << indices_[cls_id].size() << std::endl;std::cout << "Details:" << std::endl;for(size_t i = 0; i < indices_[cls_id].size(); i++) {int idx = indices_[cls_id][i];float x1 = (bboxes_[cls_id][idx].x - x_shift_) / x_scale_;float y1 = (bboxes_[cls_id][idx].y - y_shift_) / y_scale_;float x2 = x1 + (bboxes_[cls_id][idx].width) / x_scale_;float y2 = y1 + (bboxes_[cls_id][idx].height) / y_scale_;float score = scores_[cls_id][idx];// 打印每個檢測框的詳細信息std::cout << "  Detection " << i + 1 << ":" << std::endl;std::cout << "    Position: (" << x1 << ", " << y1 << ") to (" << x2 << ", " << y2 << ")" << std::endl;std::cout << "    Confidence: " << std::fixed << std::setprecision(2) << score * 100 << "%" << std::endl;}}}std::cout << "========================================\n" << std::endl;
}// 繪制結果實現
void BPU_Detect::DrawResults(cv::Mat& img) {
#if ENABLE_DRAWfor(int cls_id = 0; cls_id < classes_num_; cls_id++) {if(!indices_[cls_id].empty()) {for(size_t i = 0; i < indices_[cls_id].size(); i++) {int idx = indices_[cls_id][i];float x1 = (bboxes_[cls_id][idx].x - x_shift_) / x_scale_;float y1 = (bboxes_[cls_id][idx].y - y_shift_) / y_scale_;float x2 = x1 + (bboxes_[cls_id][idx].width) / x_scale_;float y2 = y1 + (bboxes_[cls_id][idx].height) / y_scale_;float score = scores_[cls_id][idx];// 繪制邊界框cv::rectangle(img, cv::Point(x1, y1), cv::Point(x2, y2), cv::Scalar(255, 0, 0), line_size_);// 繪制標簽std::string text = class_names_[cls_id] + ": " + std::to_string(static_cast<int>(score * 100)) + "%";cv::putText(img, text, cv::Point(x1, y1 - 5), cv::FONT_HERSHEY_SIMPLEX, font_size_, cv::Scalar(0, 0, 255), font_thickness_, cv::LINE_AA);}for (int j = 0; j < KPT_NUM; ++j){if (kpts_score_[cls_id][j] < kpt_conf_thres_raw_){continue;}int x = static_cast<int>((kpts_xy_[cls_id][j].x - x_shift_) / x_scale_);int y = static_cast<int>((kpts_xy_[cls_id][j].y - y_shift_) / y_scale_);// 繪制內圈黃色圓, 外圈紅色圓cv::circle(img, cv::Point(x, y), 5, cv::Scalar(0, 0, 255), -1);cv::circle(img, cv::Point(x, y), 2, cv::Scalar(0, 255, 255), -1);// 繪制黃色文本, 紅色文本cv::putText(img, std::to_string(j), cv::Point(x, y), cv::FONT_HERSHEY_SIMPLEX, 0.5, cv::Scalar(0, 0, 255), 3, cv::LINE_AA);cv::putText(img, std::to_string(j), cv::Point(x, y), cv::FONT_HERSHEY_SIMPLEX, 0.5, cv::Scalar(0, 255, 255), 1, cv::LINE_AA);}}}
#endif// 打印檢測結果PrintResults();
}// 特征圖處理輔助函數
void BPU_Detect::ProcessFeatureMap(hbDNNTensor& output_tensor_REG, hbDNNTensor& output_tensor_CLA, hbDNNTensor& output_tensor_KPT,int height, int width,const std::vector<std::pair<double, double>>& anchors,float conf_thres_raw) {// 檢查內存是否有效if (!output_tensor_REG.sysMem[0].virAddr || !output_tensor_REG.properties.scale.scaleData ||!output_tensor_CLA.sysMem[0].virAddr || !output_tensor_KPT.sysMem[0].virAddr) {std::cout << "Invalid memory for tensors!" << std::endl;return;}// 打印內存大小信息std::cout << "REG tensor aligned size: " << output_tensor_REG.properties.alignedByteSize << std::endl;std::cout << "REG tensor scale length: " << output_tensor_REG.properties.scale.scaleLen << std::endl;// 獲取數據指針前先刷新內存hbSysFlushMem(&output_tensor_REG.sysMem[0], HB_SYS_MEM_CACHE_INVALIDATE);hbSysFlushMem(&output_tensor_CLA.sysMem[0], HB_SYS_MEM_CACHE_INVALIDATE);hbSysFlushMem(&output_tensor_KPT.sysMem[0], HB_SYS_MEM_CACHE_INVALIDATE);// 獲取所有指針auto *s_bbox_raw = static_cast<int32_t *>(output_tensor_REG.sysMem[0].virAddr);auto *s_cls_raw = static_cast<float *>(output_tensor_CLA.sysMem[0].virAddr);auto *s_kpts_raw = static_cast<float *>(output_tensor_KPT.sysMem[0].virAddr);auto *s_bbox_scale = static_cast<float *>(output_tensor_REG.properties.scale.scaleData);// 驗證指針std::cout << "s_bbox_raw valid range: " << 0 << " to " << (output_tensor_REG.properties.alignedByteSize/sizeof(int32_t)-1) << std::endl;std::cout << "s_bbox_scale valid range: " << 0 << " to " << (output_tensor_REG.properties.scale.scaleLen-1) << std::endl;// 遍歷特征圖的每個位置for(int h = 0; h < height; h++) {for(int w = 0; w < width; w++) {float *cur_s_cls_raw = s_cls_raw;int32_t *cur_s_bbox_raw = s_bbox_raw;float *cur_s_kpts_raw = s_kpts_raw;// 在移動指針之前保存原始位置int32_t *original_bbox_raw = cur_s_bbox_raw;float *original_kpts_raw = cur_s_kpts_raw;s_cls_raw += DEFAULT_CLASSES_NUM;s_bbox_raw += REG * 4;s_kpts_raw += KPT_NUM * KPT_ENCODE;// 找到最大類別概率int cls_id = 0;for (int i = 1; i < DEFAULT_CLASSES_NUM; i++) {if (cur_s_cls_raw[i] > cur_s_cls_raw[cls_id]) {cls_id = i;}}if (cur_s_cls_raw[cls_id] < conf_thres_raw) {continue;}float score = 1 / (1 + std::exp(-cur_s_cls_raw[cls_id]));float ltrb[4], sum;for (int i = 0; i < 4; i++) {ltrb[i] = 0.;sum = 0.;for (int j = 0; j < REG; j++) {// 計算實際訪問的索引size_t bbox_index = (original_bbox_raw - static_cast<int32_t *>(output_tensor_REG.sysMem[0].virAddr)) + REG * i + j;// 檢查索引是否在有效范圍內if (bbox_index >= output_tensor_REG.properties.alignedByteSize/sizeof(int32_t)) {std::cout << "bbox_index out of range: " << bbox_index << std::endl;return;}if (j >= output_tensor_REG.properties.scale.scaleLen) {std::cout << "scale index out of range: " << j << std::endl;return;}// 安全地訪問數據try {int32_t raw_val = original_bbox_raw[REG * i + j];float scale_val = s_bbox_scale[j];float exp_val = float(raw_val) * scale_val;// 限制exp的輸入范圍if (exp_val > 88.0f) exp_val = 88.0f;if (exp_val < -88.0f) exp_val = -88.0f;float dfl = std::exp(exp_val);ltrb[i] += dfl * j;sum += dfl;} catch (const std::exception& e) {std::cout << "Exception during memory access: " << e.what() << std::endl;return;}}if (sum > 0) {ltrb[i] /= sum;}}// 計算邊界框坐標float x1 = (w + 0.5 - ltrb[0]) * (height == H_8 ? 8.0 : (height == H_16 ? 16.0 : 32.0));float y1 = (h + 0.5 - ltrb[1]) * (height == H_8 ? 8.0 : (height == H_16 ? 16.0 : 32.0));float x2 = (w + 0.5 + ltrb[2]) * (height == H_8 ? 8.0 : (height == H_16 ? 16.0 : 32.0));float y2 = (h + 0.5 + ltrb[3]) * (height == H_8 ? 8.0 : (height == H_16 ? 16.0 : 32.0));// 處理關鍵點std::vector<cv::Point2f> kpt_xy(KPT_NUM);std::vector<float> kpt_score(KPT_NUM);float stride = (height == H_8 ? 8.0 : (height == H_16 ? 16.0 : 32.0));for (int j = 0; j < KPT_NUM; j++) {try {float x = (original_kpts_raw[KPT_ENCODE * j] * 2.0 + w) * stride;float y = (original_kpts_raw[KPT_ENCODE * j + 1] * 2.0 + h) * stride;float vis = original_kpts_raw[KPT_ENCODE * j + 2];kpt_xy[j] = cv::Point2f(x, y);kpt_score[j] = vis;} catch (const std::exception& e) {std::cout << "Exception during keypoint processing: " << e.what() << std::endl;continue;}}// 添加檢測結果到對應類別的向量中bboxes_[cls_id].push_back(cv::Rect2d(x1, y1, x2 - x1, y2 - y1));scores_[cls_id].push_back(score);kpts_xy_.push_back(kpt_xy);kpts_score_.push_back(kpt_score);}}
}// 釋放資源實現
bool BPU_Detect::Release() {if (!is_initialized_) {return true;}// 釋放taskif (task_handle_) {hbDNNReleaseTask(task_handle_);task_handle_ = nullptr;}// 釋放輸入內存if (input_tensor_.sysMem[0].virAddr) {hbSysFreeMem(&input_tensor_.sysMem[0]);input_tensor_.sysMem[0].virAddr = nullptr;}// 釋放輸出內存if (output_tensors_) {for (int i = 0; i < 9; i++) {if (output_tensors_[i].sysMem[0].virAddr) {hbSysFreeMem(&output_tensors_[i].sysMem[0]);output_tensors_[i].sysMem[0].virAddr = nullptr;}}delete[] output_tensors_;output_tensors_ = nullptr;}// 釋放模型if (packed_dnn_handle_) {hbDNNRelease(packed_dnn_handle_);packed_dnn_handle_ = nullptr;}is_initialized_ = false;return true;
}// 修改main函數
int main() {// 創建檢測器實例BPU_Detect detector;// 初始化if (!detector.Init()) {std::cout << "Failed to initialize detector" << std::endl;return -1;}#if DETECT_MODE == 0// 單張圖片檢測模式std::cout << "Single image detection mode" << std::endl;// 讀取測試圖片cv::Mat input_img = cv::imread("/root/Deep_Learning/YOLOV8-Pose/imgs/0a84fc03-1873.jpg");if (input_img.empty()) {std::cout << "Failed to load image" << std::endl;return -1;}// 執行檢測cv::Mat output_img;
#if ENABLE_DRAWif (!detector.Detect(input_img, output_img)) {std::cout << "Detection failed" << std::endl;return -1;}// 保存結果cv::imwrite("cpp_result.jpg", output_img);
#elseif (!detector.Detect(input_img, output_img)) {std::cout << "Detection failed" << std::endl;return -1;}
#endif#else// 實時檢測模式std::cout << "Real-time detection mode" << std::endl;// 打開攝像頭cv::VideoCapture cap(0);if (!cap.isOpened()) {std::cout << "Failed to open camera" << std::endl;return -1;}cv::Mat frame, output_frame;while (true) {// 讀取一幀cap >> frame;if (frame.empty()) {std::cout << "Failed to read frame" << std::endl;break;}// 執行檢測if (!detector.Detect(frame, output_frame)) {std::cout << "Detection failed" << std::endl;break;}#if ENABLE_DRAW// 顯示結果cv::imshow("Real-time Detection", output_frame);// 按'q'退出if (cv::waitKey(1) == 'q') {break;}
#endif}#if ENABLE_DRAW// 釋放攝像頭cap.release();cv::destroyAllWindows();
#endif
#endif// 釋放資源detector.Release();return 0;
}

, output_img);
#else
if (!detector.Detect(input_img, output_img)) {
std::cout << “Detection failed” << std::endl;
return -1;
}
#endif

#else
// 實時檢測模式
std::cout << “Real-time detection mode” << std::endl;

// 打開攝像頭
cv::VideoCapture cap(0);
if (!cap.isOpened()) {std::cout << "Failed to open camera" << std::endl;return -1;
}cv::Mat frame, output_frame;
while (true) {// 讀取一幀cap >> frame;if (frame.empty()) {std::cout << "Failed to read frame" << std::endl;break;}// 執行檢測if (!detector.Detect(frame, output_frame)) {std::cout << "Detection failed" << std::endl;break;}

#if ENABLE_DRAW
// 顯示結果
cv::imshow(“Real-time Detection”, output_frame);

    // 按'q'退出if (cv::waitKey(1) == 'q') {break;}

#endif
}

#if ENABLE_DRAW
// 釋放攝像頭
cap.release();
cv::destroyAllWindows();
#endif
#endif

// 釋放資源
detector.Release();return 0;

}