Mediapipe簡介
Mediapipe 是由 Google Research 開發的一款開源框架,旨在幫助開發者輕松地構建、測試和部署復雜的多模態、多任務的機器學習模型。它特別擅長于實時處理和分析音頻、視頻等多媒體數據。以下是 Mediapipe 的一些關鍵特點和組件:
關鍵特點
-
多平臺支持:Mediapipe 支持在多個平臺上運行,包括桌面、移動設備和網頁。這使得開發者可以輕松地將模型部署到不同的平臺上。
-
高效的實時處理:Mediapipe 具有高度優化的性能,能夠在資源受限的設備上進行實時處理。這使其特別適合于移動設備和嵌入式系統。
-
模塊化設計:Mediapipe 使用圖表(graph)來組織和連接不同的處理模塊。這種設計使得開發者可以靈活地組合和復用不同的處理組件。
-
豐富的預構建解決方案:Mediapipe 提供了許多預構建的解決方案,如人臉檢測、手部追蹤、姿態估計等,開發者可以直接使用這些解決方案來快速構建應用。
主要組件
-
圖表(Graph):Mediapipe 的核心是其圖表結構,圖表定義了數據流和處理模塊的連接方式。每個圖表由一系列節點(nodes)和邊(edges)組成,節點表示具體的處理模塊,邊表示數據在節點之間的流動。
-
節點(Nodes):節點是圖表的基本單元,表示具體的處理操作。Mediapipe 提供了許多內置的節點,如數據輸入輸出節點、圖像處理節點、機器學習推理節點等。
-
數據包(Packets):數據包是圖表中傳輸的數據單元,節點之間通過發送和接收數據包來通信。數據包可以包含各種類型的數據,如圖像幀、音頻信號、檢測結果等。
-
計算機視覺解決方案:Mediapipe 提供了許多預構建的計算機視覺解決方案,這些解決方案已經高度優化,能夠在實時應用中使用。常見的解決方案包括人臉檢測、手部追蹤、姿態估計、對象檢測等。
常見使用場景
-
姿態估計(Pose Estimation):Mediapipe 可以實時檢測和追蹤人體的關鍵點(如肩膀、肘部、膝蓋等),并估計人體的姿態。這對于體育訓練、動作捕捉、增強現實等應用非常有用。
-
手部追蹤(Hand Tracking):Mediapipe 能夠檢測和追蹤手部的關鍵點,提供手勢識別和手部動作分析的能力。這在手勢控制、虛擬現實、手寫輸入等應用中有廣泛的應用。
-
人臉檢測(Face Detection):Mediapipe 提供了高效的人臉檢測和關鍵點追蹤功能,可以用于面部識別、表情分析、虛擬化妝等場景。
-
對象檢測(Object Detection):Mediapipe 還提供了實時的對象檢測解決方案,可以用于監控、無人駕駛、智能家居等領域。
示例代碼
以下是一個使用 Mediapipe 進行姿態估計的簡單示例:
import cv2
import mediapipe as mp# Initialize mediapipe pose class.
mp_pose = mp.solutions.pose
pose = mp_pose.Pose()# Initialize mediapipe drawing class, useful for annotation.
mp_drawing = mp.solutions.drawing_utils# Load the video file or start webcam capture.
cap = cv2.VideoCapture(0) # Use 0 for webcam, or provide video file pathwhile cap.isOpened():ret, frame = cap.read()if not ret:break# Convert the BGR image to RGB.image_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)# Process the image and detect the pose.result = pose.process(image_rgb)# Draw the pose annotation on the image.if result.pose_landmarks:mp_drawing.draw_landmarks(frame, result.pose_landmarks, mp_pose.POSE_CONNECTIONS)# Display the frame with pose landmarks.cv2.imshow('Pose Estimation', frame)# Break the loop if 'q' is pressed.if cv2.waitKey(10) & 0xFF == ord('q'):break# Release the video capture object and close display window.
cap.release()
cv2.destroyAllWindows()
這段代碼使用 Mediapipe 的姿態估計功能,讀取視頻流并實時繪制人體的關鍵點。你可以使用攝像頭實時捕捉人體姿態,也可以處理預錄制的視頻文件。
實例1-讀取視頻流并進行骨骼點繪制:
import cv2
import mediapipe as mp# Initialize mediapipe pose class.
mp_pose = mp.solutions.pose
pose = mp_pose.Pose()# Initialize mediapipe drawing class, useful for annotation.
mp_drawing = mp.solutions.drawing_utils# Load the video file.
cap = cv2.VideoCapture('D:/basketball.mp4')# Check if the video is opened successfully.
if not cap.isOpened():print("Error: Could not open video.")exit()while cap.isOpened():ret, frame = cap.read()if not ret:print("Reached the end of the video.")break# Convert the BGR image to RGB.image_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)# Process the image and detect the pose.result = pose.process(image_rgb)# Draw the pose annotation on the image.if result.pose_landmarks:mp_drawing.draw_landmarks(frame, result.pose_landmarks, mp_pose.POSE_CONNECTIONS)# Display the frame with pose landmarks.cv2.imshow('Pose Estimation', frame)# Break the loop if 'q' is pressed.if cv2.waitKey(10) & 0xFF == ord('q'):break# Release the video capture object and close display window.
cap.release()
cv2.destroyAllWindows()
效果如下:
實例2-讀取視頻流中姿態估計與3D繪制:
代碼如下:
import cv2
import mediapipe as mp
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import numpy as np
from matplotlib.animation import FuncAnimation# Initialize mediapipe pose class.
mp_pose = mp.solutions.pose
pose = mp_pose.Pose()# Initialize mediapipe drawing class, useful for annotation.
mp_drawing = mp.solutions.drawing_utils# Load the video file.
cap = cv2.VideoCapture('D:/basketball.mp4')# Check if the video is opened successfully.
if not cap.isOpened():print("Error: Could not open video.")exit()fig = plt.figure(figsize=(10, 5))
ax2d = fig.add_subplot(121)
ax3d = fig.add_subplot(122, projection='3d')def update(frame_number):ret, frame = cap.read()if not ret:print("Reached the end of the video.")return# Convert the BGR image to RGB.image_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)# Process the image and detect the pose.result = pose.process(image_rgb)# Clear the previous plotsax2d.clear()ax3d.clear()# Draw the pose annotation on the image.if result.pose_landmarks:mp_drawing.draw_landmarks(frame, result.pose_landmarks, mp_pose.POSE_CONNECTIONS)# Extract the landmark points.landmarks = result.pose_landmarks.landmarkxs = [landmark.x for landmark in landmarks]ys = [landmark.y for landmark in landmarks]zs = [landmark.z for landmark in landmarks]# Plot 2D imageax2d.imshow(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))ax2d.set_title('Pose Estimation')ax2d.axis('off')# Plot 3D landmarksax3d.scatter(xs, ys, zs, c='blue', marker='o')ax3d.set_xlim([0, 1])ax3d.set_ylim([0, 1])ax3d.set_zlim([-1, 1])ax3d.set_xlabel('X')ax3d.set_ylabel('Y')ax3d.set_zlabel('Z')ax3d.set_title('3D Pose Landmarks')ani = FuncAnimation(fig, update, interval=10)plt.show()cap.release()
cv2.destroyAllWindows()
效果如下:
為了將三維骨骼點連接起來,可以使用 mpl_toolkits.mplot3d.art3d.Line3DCollection
來繪制骨骼連接。你需要定義這些連接的點對,并在三維圖中使用它們來繪制線條。以下是更新后的代碼:
import cv2
import mediapipe as mp
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from mpl_toolkits.mplot3d.art3d import Line3DCollection
import numpy as np
from matplotlib.animation import FuncAnimation# Initialize mediapipe pose class.
mp_pose = mp.solutions.pose
pose = mp_pose.Pose()# Initialize mediapipe drawing class, useful for annotation.
mp_drawing = mp.solutions.drawing_utils# Load the video file.
cap = cv2.VideoCapture('D:/basketball.mp4')# Check if the video is opened successfully.
if not cap.isOpened():print("Error: Could not open video.")exit()fig = plt.figure(figsize=(10, 5))
ax2d = fig.add_subplot(121)
ax3d = fig.add_subplot(122, projection='3d')def update(frame_number):ret, frame = cap.read()if not ret:print("Reached the end of the video.")return# Convert the BGR image to RGB.image_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)# Process the image and detect the pose.result = pose.process(image_rgb)# Clear the previous plotsax2d.clear()ax3d.clear()# Draw the pose annotation on the image.if result.pose_landmarks:mp_drawing.draw_landmarks(frame, result.pose_landmarks, mp_pose.POSE_CONNECTIONS)# Extract the landmark points.landmarks = result.pose_landmarks.landmarkxs = [landmark.x for landmark in landmarks]ys = [landmark.y for landmark in landmarks]zs = [landmark.z for landmark in landmarks]# Define the connections between landmarksconnections = [(0, 1), (1, 2), (2, 3), (3, 7), (0, 4), (4, 5), (5, 6), (6, 8),(9, 10), (11, 12), (11, 13), (13, 15), (15, 17), (15, 19), (15, 21),(17, 19), (12, 14), (14, 16), (16, 18), (16, 20), (16, 22), (18, 20),(11, 23), (12, 24), (23, 24), (23, 25), (24, 26), (25, 27), (26, 28),(27, 29), (28, 30), (29, 31), (30, 32)]# Create a list of 3D lineslines = [[(xs[start], ys[start], zs[start]), (xs[end], ys[end], zs[end])] for start, end in connections]# Plot 2D imageax2d.imshow(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))ax2d.set_title('Pose Estimation')ax2d.axis('off')# Plot 3D landmarks and connectionsax3d.scatter(xs, ys, zs, c='blue', marker='o')ax3d.add_collection3d(Line3DCollection(lines, colors='blue', linewidths=2))ax3d.set_xlim([0, 1])ax3d.set_ylim([0, 1])ax3d.set_zlim([-1, 1])ax3d.set_xlabel('X')ax3d.set_ylabel('Y')ax3d.set_zlabel('Z')ax3d.set_title('3D Pose Landmarks')ani = FuncAnimation(fig, update, interval=10)plt.show()cap.release()
cv2.destroyAllWindows()
在這個代碼中,我們定義了 connections
列表,它包含了骨骼點之間的連接對。然后我們創建了一個 lines
列表,用于存儲這些連接的三維線段,并使用 ax3d.add_collection3d(Line3DCollection(lines, colors='blue', linewidths=2))
方法將這些線段添加到三維圖中。
運行這個腳本后,三維圖中不僅會顯示骨骼點,還會將這些點連起來,形成完整的骨骼結構。
效果如下:
上面的代碼看似三維圖的骨骼是倒立的,你可以調整三維圖的坐標顯示,以使得骨骼結構顯示為正常的人體姿態。可以通過設置三維圖的坐標軸范圍和方向來調整顯示效果。以下是修改后的代碼,調整了坐標軸的范圍和方向,以使骨骼結構正常顯示:
import cv2
import mediapipe as mp
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from mpl_toolkits.mplot3d.art3d import Line3DCollection
import numpy as np
from matplotlib.animation import FuncAnimation# Initialize mediapipe pose class.
mp_pose = mp.solutions.pose
pose = mp_pose.Pose()# Initialize mediapipe drawing class, useful for annotation.
mp_drawing = mp.solutions.drawing_utils# Load the video file.
cap = cv2.VideoCapture('D:/basketball.mp4')# Check if the video is opened successfully.
if not cap.isOpened():print("Error: Could not open video.")exit()fig = plt.figure(figsize=(10, 5))
ax2d = fig.add_subplot(121)
ax3d = fig.add_subplot(122, projection='3d')def update(frame_number):ret, frame = cap.read()if not ret:print("Reached the end of the video.")return# Convert the BGR image to RGB.image_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)# Process the image and detect the pose.result = pose.process(image_rgb)# Clear the previous plotsax2d.clear()ax3d.clear()# Draw the pose annotation on the image.if result.pose_landmarks:mp_drawing.draw_landmarks(frame, result.pose_landmarks, mp_pose.POSE_CONNECTIONS)# Extract the landmark points.landmarks = result.pose_landmarks.landmarkxs = [landmark.x for landmark in landmarks]ys = [landmark.y for landmark in landmarks]zs = [-landmark.z for landmark in landmarks] # Negate the z-axis for better visualization# Define the connections between landmarksconnections = [(0, 1), (1, 2), (2, 3), (3, 7), (0, 4), (4, 5), (5, 6), (6, 8),(9, 10), (11, 12), (11, 13), (13, 15), (15, 17), (15, 19), (15, 21),(17, 19), (12, 14), (14, 16), (16, 18), (16, 20), (16, 22), (18, 20),(11, 23), (12, 24), (23, 24), (23, 25), (24, 26), (25, 27), (26, 28),(27, 29), (28, 30), (29, 31), (30, 32)]# Create a list of 3D lineslines = [[(xs[start], ys[start], zs[start]), (xs[end], ys[end], zs[end])] for start, end in connections]# Plot 2D imageax2d.imshow(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))ax2d.set_title('Pose Estimation')ax2d.axis('off')# Plot 3D landmarks and connectionsax3d.scatter(xs, ys, zs, c='blue', marker='o')ax3d.add_collection3d(Line3DCollection(lines, colors='blue', linewidths=2))ax3d.set_xlim([0, 1])ax3d.set_ylim([1, 0]) # Flip the y-axis for better visualizationax3d.set_zlim([1, -1])ax3d.set_xlabel('X')ax3d.set_ylabel('Y')ax3d.set_zlabel('Z')ax3d.set_title('3D Pose Landmarks')ani = FuncAnimation(fig, update, interval=10)plt.show()cap.release()
cv2.destroyAllWindows()
在這個代碼中:
- 通過取反
zs
坐標 (zs = [-landmark.z for landmark in landmarks]
),使得骨骼點的 Z 軸方向與預期一致。 - 通過設置
ax3d.set_ylim([1, 0])
來翻轉 Y 軸的方向,以便更符合常見的視覺習慣。
運行這個腳本后,三維圖中的骨骼結構應會顯示為正常的人體姿態。
顯示效果如下: