使用realsense進行目標檢測并標識目標深度

涉及知識點都在代碼中注釋了，直接看代碼
// This example is derived from the ssd_mobilenet_object_detection opencv demo
// and adapted to be used with Intel RealSense Cameras
// Please see https://github.com/opencv/opencv/blob/master/LICENSE#include <opencv2/dnn.hpp>
#include <librealsense2/rs.hpp>
#include "../cv-helpers.hpp"const size_t inWidth      = 300;
const size_t inHeight     = 300;
const float WHRatio       = inWidth / (float)inHeight;
const float inScaleFactor = 0.007843f;
const float meanVal       = 127.5;
const char* classNames[]  = {"background","aeroplane", "bicycle", "bird", "boat","bottle", "bus", "car", "cat", "chair","cow", "diningtable", "dog", "horse","motorbike", "person", "pottedplant","sheep", "sofa", "train", "tvmonitor"};int main(int argc, char** argv) try
{using namespace cv;using namespace cv::dnn;using namespace rs2;Net net = readNetFromCaffe("MobileNetSSD_deploy.prototxt", "MobileNetSSD_deploy.caffemodel");// Start streaming from Intel RealSense Camera// 流水線（Pipeline）簡化了用戶與設備及計算機視覺處理模塊的交互。// 該抽象類封裝了相機配置、流傳輸、視覺模塊觸發和線程管理，使應用程序能夠專注于模塊的計算機視覺輸出或設備的原始數據輸出。// 流水線可管理以處理塊（Processing Blocks）形式實現的計算機視覺模塊：// 流水線作為處理塊接口的消費者（調用方）// 應用程序則作為計算機視覺接口的消費者pipeline pipe;// 以默認配置啟動流水線流傳輸// 流水線流傳輸循環會從設備捕獲樣本數據，并根據各模塊的需求和線程模型，將其傳遞至已連接的計算機視覺模塊和處理塊。// 流傳輸期間，應用程序可通過以下方式訪問相機數據流：// 調用 wait_for_frames()（阻塞式等待幀）// 調用 poll_for_frames()（非阻塞輪詢幀）// 運行條件：// 循環持續執行，直至流水線被主動停止。// 僅當流水線處于未啟動狀態時方可啟動，否則將拋出異常。// 返回值// 成功配置后，返回流水線設備及數據流的實際配置信息（profile）。// pipeline_profile rs2::pipeline::start()auto config = pipe.start();// 流水線配置（pipeline profile）包含設備及其啟用的數據流組合（含具體參數配置）。該配置是流水線根據定義的過濾器和條件，從上述選項中篩選出的結果。同一設備的不同傳感器可能共享多個數據流。// ◆ get_stream()// stream_profile rs2::pipeline_profile::get_stream (rs2_stream	stream_type, int stream_index = -1)	const i// Return the stream profile that is enabled for the specified stream in this profile.// Parameters// [in]	stream_type	Stream type of the desired profile// [in]	stream_index	Stream index of the desired profile. -1 for any matching.// Returns// The first matching stream profile // template<class T >// rs2::stream_profile::as() const// Template function, casting the instance as another class type// Returns// class instance - pointer or null. auto profile = config.get_stream(RS2_STREAM_COLOR).as<video_stream_profile>();// 創建對齊過濾器// 對齊操作將在深度圖像與另一圖像之間進行。具體配置方式如下：// 若需將深度圖對齊至其他圖像流// 將 align_to 參數設置為目標流的類型（如 RS2_STREAM_COLOR）。// 若需將非深度圖對齊至深度圖// 將 align_to 參數設為 RS2_STREAM_DEPTH。// 運行時機制// 相機校準參數與幀的流類型，將根據首次傳入 process() 的有效幀集合動態確定。rs2::align align_to(RS2_STREAM_COLOR);Size cropSize;if (profile.width() / (float)profile.height() > WHRatio){cropSize = Size(static_cast<int>(profile.height() * WHRatio),profile.height());}else{cropSize = Size(profile.width(),static_cast<int>(profile.width() / WHRatio));}Rect crop(Point((profile.width() - cropSize.width) / 2,(profile.height() - cropSize.height) / 2),cropSize);const auto window_name = "Display Image";namedWindow(window_name, WINDOW_AUTOSIZE);while (getWindowProperty(window_name, WND_PROP_AUTOSIZE) >= 0){// Wait for the next set of framesauto data = pipe.wait_for_frames();// Make sure the frames are spatially aligned// 對輸入幀執行對齊處理，生成已對齊的幀集合data = align_to.process(data);auto color_frame = data.get_color_frame();auto depth_frame = data.get_depth_frame();// If we only received new depth frame, // but the color did not update, continuestatic int last_frame_number = 0;if (color_frame.get_frame_number() == last_frame_number) continue;last_frame_number = static_cast<int>(color_frame.get_frame_number());// Convert RealSense frame to OpenCV matrix:auto color_mat = frame_to_mat(color_frame);auto depth_mat = depth_frame_to_meters(depth_frame);/*** @brief 從圖像創建4維blob(數據塊)。可選進行中心裁剪/縮放、均值減法、歸一化縮放和B/R通道交換。* @param image 輸入圖像(支持1/3/4通道格式)* @param size 輸出圖像的空間尺寸* @param mean 各通道要減去的均值(標量)。當圖像為BGR格式且swapRB=true時，應按(mean-R, mean-G, mean-B)順序提供* @param scalefactor 圖像像素值的縮放系數* @param swapRB 是否交換3通道圖像的首末通道(BGR<->RGB)* @param crop 縮放后是否中心裁剪* @param ddepth 輸出blob的深度。可選CV_32F或CV_8U* @details 若crop=true，輸入圖像將按比例縮放至至少一個維度匹配目標尺寸，然后中心裁剪。*          若crop=false，則直接縮放圖像(保持長寬比)。* @returns 返回NCHW維度順序的4維Mat* 將輸入圖像轉換為神經網絡所需的4D blob格式（NCHW順序），并支持以下預處理：* 尺寸變換：調整圖像大小* 色彩處理：通道交換(BGR?RGB)* 歸一化：均值減法 + 像素值縮放* 裁剪策略：中心裁剪或保持比例縮放*/Mat inputBlob = blobFromImage(color_mat, inScaleFactor,Size(inWidth, inHeight), meanVal, false); //Convert Mat to batch of imagesnet.setInput(inputBlob, "data"); //set the network input/*** @brief 執行前向傳播計算指定名稱層的輸出* @param outputName 需要獲取輸出的層名稱（默認為空，表示整個網絡,輸出最后一層）* @return 返回指定層的第一個輸出blob（多維數據塊）* @details 默認情況下會對整個網絡執行前向傳播計算* 核心功能* 神經網絡前向傳播：執行已加載神經網絡的計算過程* 靈活輸出控制：可獲取中間層或輸出層的結果* 高效計算：自動處理層間依賴關系和數據傳遞*/Mat detection = net.forward("detection_out"); //compute output/*** 在 OpenCV 的 Mat 類中，size[] 是一個數組，用于表示 多維矩陣的維度大小。對于神經網絡輸出的 4D blob（通常來自 SSD、Faster R-CNN 等目標檢測模型），size[2] 有特定含義：* 1. 4D blob 的維度結構* 典型的目標檢測網絡（如 SSD）輸出 blob 的維度為：* [1, 1, N, 7]* 對應的 size[] 索引和含義：* 索引 (size[n])	維度名稱	含義* size[0]	第0維	批量大小（batch size），通常為1（單張圖像推理）* size[1]	第1維	固定為1（歷史遺留設計，無實際意義）* size[2]	第2維	檢測到的物體數量（N）* size[3]	第3維	每個物體的參數數量（固定為7）* 2. 在代碼中的具體應用// 網絡輸出是一個4D blob，形狀為 [1, 1, N, 7]Mat detection = net.forward("detection_out"); // 轉換為2D矩陣：[N行 x 7列]Mat detectionMat(detection.size[2],  // 行數 = 檢測到的物體數量（N）detection.size[3],  // 列數 = 每個物體的參數數量（7）CV_32F,            // 數據類型為32位浮點detection.ptr<float>()  // 直接引用數據指針);* 3. 為什么是 size[2] 和 size[3]？OpenCV 的 Mat 使用 行優先存儲（Row-Major），維度索引從外到內：size[0]：最外層維度（批量）size[3]：最內層維度（單個參數值）對于目標檢測任務，我們只關心實際的檢測結果（物體數量和參數），所以跳過無意義的 size[0] 和 size[1]。* 4. 參數的具體內容（每行的7個值）轉換后的 detectionMat 是一個 N×7 的矩陣，每行代表一個檢測到的物體，包含7個值：列索引	含義	數據類型	說明0	   圖像ID	float	  總是0（單圖像推理時無用）1	   類別ID	float	對應 classNames 數組的索引2	   置信度	float	0~1之間的檢測置信度3	   左上角x	float	歸一化坐標（0~1）4	   左上角y	float	歸一化坐標（0~1）5	   右下角x	float	歸一化坐標（0~1）6	   右下角y	float	歸一化坐標（0~1）* 5. 實際應用示例for (int i = 0; i < detectionMat.rows; i++) {  // 遍歷所有檢測到的物體float confidence = detectionMat.at<float>(i, 2);  // 獲取置信度if (confidence > 0.5) {  // 過濾低置信度檢測int classId = static_cast<int>(detectionMat.at<float>(i, 1));  // 類別ID// 將歸一化坐標轉換為像素坐標int x1 = detectionMat.at<float>(i, 3) * image.cols;int y1 = detectionMat.at<float>(i, 4) * image.rows;int x2 = detectionMat.at<float>(i, 5) * image.cols;int y2 = detectionMat.at<float>(i, 6) * image.rows;// 繪制檢測框...}}* 6. 其他模型的差異YOLOv3/v5：輸出維度可能不同（如 [1, 25200, 85]），需調整解析邏輯。分類模型：輸出通常是 [1, N]（N為類別數），無需處理 size[2]。**/Mat detectionMat(detection.size[2], detection.size[3], CV_32F, detection.ptr<float>());// Crop both color and depth frames/**  eg:將1080p圖像中心裁剪為224x224（分類模型輸入）原始圖像尺寸：1920x1080cv::Size profile(1920, 1080); 目標裁剪尺寸：224x224cv::Size cropSize(224, 224);  計算居中裁剪區域cv::Rect crop(cv::Point((1920 - 224)/2, (1080 - 224)/2),  // Point(848, 428)cropSize);執行裁剪cv::Mat originalImage = cv::imread("input.jpg");cv::Mat croppedImage = originalImage(crop);  // 得到224x224圖像**/color_mat = color_mat(crop);depth_mat = depth_mat(crop);float confidenceThreshold = 0.8f;for(int i = 0; i < detectionMat.rows; i++){float confidence = detectionMat.at<float>(i, 2);if(confidence > confidenceThreshold){size_t objectClass = (size_t)(detectionMat.at<float>(i, 1));int xLeftBottom = static_cast<int>(detectionMat.at<float>(i, 3) * color_mat.cols);int yLeftBottom = static_cast<int>(detectionMat.at<float>(i, 4) * color_mat.rows);int xRightTop = static_cast<int>(detectionMat.at<float>(i, 5) * color_mat.cols);int yRightTop = static_cast<int>(detectionMat.at<float>(i, 6) * color_mat.rows);Rect object((int)xLeftBottom, (int)yLeftBottom,(int)(xRightTop - xLeftBottom),(int)(yRightTop - yLeftBottom));object = object  & Rect(0, 0, depth_mat.cols, depth_mat.rows);// Calculate mean depth inside the detection region// This is a very naive way to estimate objects depth// but it is intended to demonstrate how one might // use depth data in general// 計算輸入數組（通常是圖像）的 各通道均值，支持通過掩碼限定計算區域。// 返回 cv::Scalar（4元素向量），每個元素對應一個通道的均值：// 單通道圖像：Scalar(mean_val, 0, 0, 0)// 三通道BGR圖像：Scalar(mean_B, mean_G, mean_R, 0)Scalar m = mean(depth_mat(object));std::ostringstream ss;ss << classNames[objectClass] << " ";ss << std::setprecision(2) << m[0] << " meters away";String conf(ss.str());rectangle(color_mat, object, Scalar(0, 255, 0));int baseLine = 0;Size labelSize = getTextSize(ss.str(), FONT_HERSHEY_SIMPLEX, 0.5, 1, &baseLine);auto center = (object.br() + object.tl())*0.5;center.x = center.x - labelSize.width / 2;rectangle(color_mat, Rect(Point(center.x, center.y - labelSize.height),Size(labelSize.width, labelSize.height + baseLine)),Scalar(255, 255, 255), FILLED);putText(color_mat, ss.str(), center,FONT_HERSHEY_SIMPLEX, 0.5, Scalar(0,0,0));}}imshow(window_name, color_mat);if (waitKey(1) >= 0) break;}return EXIT_SUCCESS;
}
catch (const rs2::error & e)
{std::cerr << "RealSense error calling " << e.get_failed_function() << "(" << e.get_failed_args() << "):\n    " << e.what() << std::endl;return EXIT_FAILURE;
}
catch (const std::exception& e)
{std::cerr << e.what() << std::endl;return EXIT_FAILURE;
}