深入探索Supervision庫：Python中的AI視覺助手

在這里插入圖片描述

深入探索Supervision庫：Python中的AI視覺助手

在計算機視覺和機器學習領域，數據處理和結果可視化是項目成功的關鍵環節。今天我們將深入探討一個強大的Python庫——Supervision，它專為簡化AI視覺項目的工作流程而設計。

什么是Supervision？

Supervision是一個開源的Python庫，旨在為計算機視覺項目提供一系列實用工具，特別是在對象檢測、分割和跟蹤任務中。它提供了直觀的API，可以與流行的機器學習框架（如YOLO、Detectron2等）無縫集成，大大簡化了從模型推理到結果可視化的整個流程。

核心功能概述

Supervision的主要功能包括但不限于：

標注可視化（邊界框、掩碼、標簽等）
數據集處理與轉換
檢測過濾與后處理
視頻流處理
性能分析工具
與多種計算機視覺框架的集成

安裝Supervision

安裝Supervision非常簡單，可以通過pip完成：

pip install supervision

如果你需要完整的功能（包括視頻處理支持）：

pip install supervision[full]

基礎使用示例

讓我們從一個簡單的例子開始，展示如何使用Supervision可視化檢測結果。

import cv2
import supervision as sv
from ultralytics import YOLO# 加載YOLOv8模型
model = YOLO('yolov8n.pt')# 讀取圖像
image = cv2.imread('image.jpg')# 運行推理
results = model(image)[0]
detections = sv.Detections.from_yolov8(results)# 創建標注工具
box_annotator = sv.BoxAnnotator()# 標注圖像
labels = [f"{model.model.names[class_id]} {confidence:0.2f}"for _, _, confidence, class_id, _in detections
]
annotated_image = box_annotator.annotate(scene=image.copy(),detections=detections,labels=labels
)# 顯示結果
sv.plot_image(annotated_image)

檢測結果處理

Supervision的Detections類是處理檢測結果的核心。讓我們看看如何操作這些檢測結果。

# 過濾低置信度的檢測
high_confidence_detections = detections[detections.confidence > 0.7]# 只保留特定類別的檢測
person_detections = detections[detections.class_id == 0]  # 假設0是人# 獲取檢測的邊界框坐標
for bbox in person_detections.xyxy:print(f"邊界框坐標: {bbox}")# 計算檢測區域中心點
centers = person_detections.get_anchors_coordinates(sv.Position.CENTER)
print(f"中心點坐標: {centers}")

高級標注功能

Supervision提供了多種標注樣式，可以滿足不同的可視化需求。

# 創建不同類型的標注器
box_annotator = sv.BoxAnnotator(thickness=2,text_thickness=1,text_scale=0.5
)mask_annotator = sv.MaskAnnotator()
label_annotator = sv.LabelAnnotator()
circle_annotator = sv.CircleAnnotator()# 組合使用多種標注
annotated_image = box_annotator.annotate(image.copy(), detections)
annotated_image = mask_annotator.annotate(annotated_image, detections)
annotated_image = label_annotator.annotate(annotated_image, detections)
annotated_image = circle_annotator.annotate(annotated_image, detections,anchor=sv.Position.CENTER
)sv.plot_image(annotated_image)

視頻處理能力

Supervision簡化了視頻處理流程，使得處理視頻流就像處理單幀圖像一樣簡單。

# 創建視頻處理器
video_info = sv.VideoInfo.from_video_path("video.mp4")
frame_generator = sv.get_video_frames_generator("video.mp4")# 初始化跟蹤器
byte_tracker = sv.ByteTrack()# 處理每一幀
with sv.VideoSink("output.mp4", video_info) as sink:for frame in frame_generator:results = model(frame)[0]detections = sv.Detections.from_yolov8(results)detections = byte_tracker.update_with_detections(detections)annotated_frame = box_annotator.annotate(scene=frame.copy(),detections=detections,labels=labels)sink.write_frame(annotated_frame)

數據集工具

Supervision提供了一些便捷的數據集處理工具。

# 加載COCO數據集
dataset = sv.DetectionDataset.from_coco(images_directory_path="train/images",annotations_path="train/annotations.json"
)# 隨機采樣并可視化
samples = dataset.sample(4)
sv.plot_images_grid(images=[sample.image for sample in samples],annotations=[sample.annotations for sample in samples],grid_size=(2, 2),size=(16, 16)
)# 轉換為其他格式
dataset.as_yolo(images_directory_path="yolo/images",annotations_directory_path="yolo/labels",data_yaml_path="yolo/data.yaml"
)

高級分析功能

Supervision還包含一些高級分析工具，如區域計數和熱圖生成。

# 定義感興趣區域
polygon = np.array([[100, 100],[300, 100],[300, 300],[100, 300]
])
zone = sv.PolygonZone(polygon, frame_resolution_wh=(640, 480))# 創建分析工具
zone_annotator = sv.PolygonZoneAnnotator(zone=zone, color=sv.Color.red()
)
heat_map_annotator = sv.HeatMapAnnotator()# 處理視頻并分析
heat_map = np.zeros((480, 640), dtype=np.float32)
with sv.VideoSink("analysis_output.mp4", video_info) as sink:for frame in frame_generator:results = model(frame)[0]detections = sv.Detections.from_yolov8(results)# 更新區域計數zone.trigger(detections)# 更新熱圖heat_map = heat_map_annotator.update(heat_map, detections)# 標注annotated_frame = box_annotator.annotate(frame.copy(), detections)annotated_frame = zone_annotator.annotate(annotated_frame)annotated_frame = heat_map_annotator.annotate(annotated_frame,heat_map=heat_map)sink.write_frame(annotated_frame)

自定義標注樣式

Supervision允許完全自定義標注的外觀。

# 自定義顏色和樣式
class CustomColor:BOX = sv.Color(r=255, g=0, b=0)  # 紅色邊框TEXT = sv.Color(r=255, g=255, b=255)  # 白色文本BACKGROUND = sv.Color(r=0, g=0, b=0, a=128)  # 半透明黑色背景custom_annotator = sv.BoxAnnotator(color=CustomColor.BOX,text_color=CustomColor.TEXT,text_background_color=CustomColor.BACKGROUND,text_padding=2,thickness=3,corner_radius=10
)annotated_image = custom_annotator.annotate(scene=image.copy(),detections=detections,labels=labels
)
sv.plot_image(annotated_image)

與不同框架集成

Supervision支持與多種流行框架的集成。

# 從不同框架創建Detections對象# 從YOLOv8
detections = sv.Detections.from_yolov8(results)# 從Detectron2
# outputs = predictor(image)
# detections = sv.Detections.from_detectron2(outputs)# 從MMDetection
# result = inference_detector(model, image)
# detections = sv.Detections.from_mmdetection(result)# 從TorchVision
# outputs = model(image)
# detections = sv.Detections.from_torchvision(outputs)

實用工具函數

Supervision還包含許多有用的實用函數。

# 圖像處理
resized_image = sv.resize_image(image, scale_factor=0.5)
gray_image = sv.cvt_color(image, sv.ColorConversion.BGR2GRAY)# 視頻工具
sv.get_video_frames_count("video.mp4")
sv.get_video_fps("video.mp4")# 文件系統
sv.list_files_with_extensions(directory="dataset/images",extensions=["jpg", "png"]
)# 繪圖工具
sv.draw_text(scene=image.copy(),text="Sample Text",text_anchor=sv.Point(100, 100),text_color=sv.Color.red(),text_scale=1.0,text_thickness=2,background_color=sv.Color.white()
)

性能優化技巧

當處理大規模數據時，性能變得尤為重要。

# 使用多線程處理視頻
with sv.VideoSink("output.mp4", video_info) as sink:with sv.FramesThreadBatchProcessor(source_path="video.mp4",batch_size=4,max_workers=4) as batch_generator:for batch in batch_generator:batch_results = model(batch.frames)batch_detections = [sv.Detections.from_yolov8(results)for results in batch_results]for frame, detections in zip(batch.frames, batch_detections):annotated_frame = box_annotator.annotate(scene=frame.copy(),detections=detections)sink.write_frame(annotated_frame)

實際應用案例

讓我們看一個完整的行人計數應用示例。

import numpy as np
import supervision as sv
from ultralytics import YOLO# 初始化模型和工具
model = YOLO('yolov8n.pt')
byte_tracker = sv.ByteTrack()
box_annotator = sv.BoxAnnotator()# 定義計數區域
counting_zone = np.array([[200, 150],[800, 150],[800, 600],[200, 600]
])
zone = sv.PolygonZone(polygon=counting_zone, frame_resolution_wh=(1280, 720))
zone_annotator = sv.PolygonZoneAnnotator(zone=zone,color=sv.Color.green(),text_color=sv.Color.black(),text_scale=2,text_thickness=4,text_padding=8
)# 處理視頻
with sv.VideoSink("people_counting.mp4", sv.VideoInfo.from_video_path("input.mp4")) as sink:for frame in sv.get_video_frames_generator("input.mp4"):# 推理results = model(frame)[0]detections = sv.Detections.from_yolov8(results)# 只保留人（class_id=0）detections = detections[detections.class_id == 0]# 更新跟蹤器detections = byte_tracker.update_with_detections(detections)# 更新計數區域zone.trigger(detections)# 標注labels = [f"#{tracker_id} {model.model.names[class_id]} {confidence:0.2f}"for _, _, confidence, class_id, tracker_idin detections]annotated_frame = box_annotator.annotate(scene=frame.copy(),detections=detections,labels=labels)annotated_frame = zone_annotator.annotate(annotated_frame)sink.write_frame(annotated_frame)print(f"總人數統計: {zone.current_count}")