Mediapipe-姿態估計實例

Mediapipe簡介

Mediapipe 是由 Google Research 開發的一款開源框架，旨在幫助開發者輕松地構建、測試和部署復雜的多模態、多任務的機器學習模型。它特別擅長于實時處理和分析音頻、視頻等多媒體數據。以下是 Mediapipe 的一些關鍵特點和組件：

關鍵特點

多平臺支持：Mediapipe 支持在多個平臺上運行，包括桌面、移動設備和網頁。這使得開發者可以輕松地將模型部署到不同的平臺上。
高效的實時處理：Mediapipe 具有高度優化的性能，能夠在資源受限的設備上進行實時處理。這使其特別適合于移動設備和嵌入式系統。
模塊化設計：Mediapipe 使用圖表（graph）來組織和連接不同的處理模塊。這種設計使得開發者可以靈活地組合和復用不同的處理組件。
豐富的預構建解決方案：Mediapipe 提供了許多預構建的解決方案，如人臉檢測、手部追蹤、姿態估計等，開發者可以直接使用這些解決方案來快速構建應用。

主要組件

圖表（Graph）：Mediapipe 的核心是其圖表結構，圖表定義了數據流和處理模塊的連接方式。每個圖表由一系列節點（nodes）和邊（edges）組成，節點表示具體的處理模塊，邊表示數據在節點之間的流動。
節點（Nodes）：節點是圖表的基本單元，表示具體的處理操作。Mediapipe 提供了許多內置的節點，如數據輸入輸出節點、圖像處理節點、機器學習推理節點等。
數據包（Packets）：數據包是圖表中傳輸的數據單元，節點之間通過發送和接收數據包來通信。數據包可以包含各種類型的數據，如圖像幀、音頻信號、檢測結果等。
計算機視覺解決方案：Mediapipe 提供了許多預構建的計算機視覺解決方案，這些解決方案已經高度優化，能夠在實時應用中使用。常見的解決方案包括人臉檢測、手部追蹤、姿態估計、對象檢測等。

常見使用場景

姿態估計（Pose Estimation）：Mediapipe 可以實時檢測和追蹤人體的關鍵點（如肩膀、肘部、膝蓋等），并估計人體的姿態。這對于體育訓練、動作捕捉、增強現實等應用非常有用。
手部追蹤（Hand Tracking）：Mediapipe 能夠檢測和追蹤手部的關鍵點，提供手勢識別和手部動作分析的能力。這在手勢控制、虛擬現實、手寫輸入等應用中有廣泛的應用。
人臉檢測（Face Detection）：Mediapipe 提供了高效的人臉檢測和關鍵點追蹤功能，可以用于面部識別、表情分析、虛擬化妝等場景。
對象檢測（Object Detection）：Mediapipe 還提供了實時的對象檢測解決方案，可以用于監控、無人駕駛、智能家居等領域。

示例代碼

以下是一個使用 Mediapipe 進行姿態估計的簡單示例：

import cv2
import mediapipe as mp# Initialize mediapipe pose class.
mp_pose = mp.solutions.pose
pose = mp_pose.Pose()# Initialize mediapipe drawing class, useful for annotation.
mp_drawing = mp.solutions.drawing_utils# Load the video file or start webcam capture.
cap = cv2.VideoCapture(0)  # Use 0 for webcam, or provide video file pathwhile cap.isOpened():ret, frame = cap.read()if not ret:break# Convert the BGR image to RGB.image_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)# Process the image and detect the pose.result = pose.process(image_rgb)# Draw the pose annotation on the image.if result.pose_landmarks:mp_drawing.draw_landmarks(frame, result.pose_landmarks, mp_pose.POSE_CONNECTIONS)# Display the frame with pose landmarks.cv2.imshow('Pose Estimation', frame)# Break the loop if 'q' is pressed.if cv2.waitKey(10) & 0xFF == ord('q'):break# Release the video capture object and close display window.
cap.release()
cv2.destroyAllWindows()

這段代碼使用 Mediapipe 的姿態估計功能，讀取視頻流并實時繪制人體的關鍵點。你可以使用攝像頭實時捕捉人體姿態，也可以處理預錄制的視頻文件。

實例1-讀取視頻流并進行骨骼點繪制：

import cv2
import mediapipe as mp# Initialize mediapipe pose class.
mp_pose = mp.solutions.pose
pose = mp_pose.Pose()# Initialize mediapipe drawing class, useful for annotation.
mp_drawing = mp.solutions.drawing_utils# Load the video file.
cap = cv2.VideoCapture('D:/basketball.mp4')# Check if the video is opened successfully.
if not cap.isOpened():print("Error: Could not open video.")exit()while cap.isOpened():ret, frame = cap.read()if not ret:print("Reached the end of the video.")break# Convert the BGR image to RGB.image_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)# Process the image and detect the pose.result = pose.process(image_rgb)# Draw the pose annotation on the image.if result.pose_landmarks:mp_drawing.draw_landmarks(frame, result.pose_landmarks, mp_pose.POSE_CONNECTIONS)# Display the frame with pose landmarks.cv2.imshow('Pose Estimation', frame)# Break the loop if 'q' is pressed.if cv2.waitKey(10) & 0xFF == ord('q'):break# Release the video capture object and close display window.
cap.release()
cv2.destroyAllWindows()

效果如下：
在這里插入圖片描述

實例2-讀取視頻流中姿態估計與3D繪制：

代碼如下：

import cv2
import mediapipe as mp
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import numpy as np
from matplotlib.animation import FuncAnimation# Initialize mediapipe pose class.
mp_pose = mp.solutions.pose
pose = mp_pose.Pose()# Initialize mediapipe drawing class, useful for annotation.
mp_drawing = mp.solutions.drawing_utils# Load the video file.
cap = cv2.VideoCapture('D:/basketball.mp4')# Check if the video is opened successfully.
if not cap.isOpened():print("Error: Could not open video.")exit()fig = plt.figure(figsize=(10, 5))
ax2d = fig.add_subplot(121)
ax3d = fig.add_subplot(122, projection='3d')def update(frame_number):ret, frame = cap.read()if not ret:print("Reached the end of the video.")return# Convert the BGR image to RGB.image_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)# Process the image and detect the pose.result = pose.process(image_rgb)# Clear the previous plotsax2d.clear()ax3d.clear()# Draw the pose annotation on the image.if result.pose_landmarks:mp_drawing.draw_landmarks(frame, result.pose_landmarks, mp_pose.POSE_CONNECTIONS)# Extract the landmark points.landmarks = result.pose_landmarks.landmarkxs = [landmark.x for landmark in landmarks]ys = [landmark.y for landmark in landmarks]zs = [landmark.z for landmark in landmarks]# Plot 2D imageax2d.imshow(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))ax2d.set_title('Pose Estimation')ax2d.axis('off')# Plot 3D landmarksax3d.scatter(xs, ys, zs, c='blue', marker='o')ax3d.set_xlim([0, 1])ax3d.set_ylim([0, 1])ax3d.set_zlim([-1, 1])ax3d.set_xlabel('X')ax3d.set_ylabel('Y')ax3d.set_zlabel('Z')ax3d.set_title('3D Pose Landmarks')ani = FuncAnimation(fig, update, interval=10)plt.show()cap.release()
cv2.destroyAllWindows()

效果如下：
在這里插入圖片描述
為了將三維骨骼點連接起來，可以使用 mpl_toolkits.mplot3d.art3d.Line3DCollection 來繪制骨骼連接。你需要定義這些連接的點對，并在三維圖中使用它們來繪制線條。以下是更新后的代碼：

import cv2
import mediapipe as mp
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from mpl_toolkits.mplot3d.art3d import Line3DCollection
import numpy as np
from matplotlib.animation import FuncAnimation# Initialize mediapipe pose class.
mp_pose = mp.solutions.pose
pose = mp_pose.Pose()# Initialize mediapipe drawing class, useful for annotation.
mp_drawing = mp.solutions.drawing_utils# Load the video file.
cap = cv2.VideoCapture('D:/basketball.mp4')# Check if the video is opened successfully.
if not cap.isOpened():print("Error: Could not open video.")exit()fig = plt.figure(figsize=(10, 5))
ax2d = fig.add_subplot(121)
ax3d = fig.add_subplot(122, projection='3d')def update(frame_number):ret, frame = cap.read()if not ret:print("Reached the end of the video.")return# Convert the BGR image to RGB.image_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)# Process the image and detect the pose.result = pose.process(image_rgb)# Clear the previous plotsax2d.clear()ax3d.clear()# Draw the pose annotation on the image.if result.pose_landmarks:mp_drawing.draw_landmarks(frame, result.pose_landmarks, mp_pose.POSE_CONNECTIONS)# Extract the landmark points.landmarks = result.pose_landmarks.landmarkxs = [landmark.x for landmark in landmarks]ys = [landmark.y for landmark in landmarks]zs = [landmark.z for landmark in landmarks]# Define the connections between landmarksconnections = [(0, 1), (1, 2), (2, 3), (3, 7), (0, 4), (4, 5), (5, 6), (6, 8),(9, 10), (11, 12), (11, 13), (13, 15), (15, 17), (15, 19), (15, 21),(17, 19), (12, 14), (14, 16), (16, 18), (16, 20), (16, 22), (18, 20),(11, 23), (12, 24), (23, 24), (23, 25), (24, 26), (25, 27), (26, 28),(27, 29), (28, 30), (29, 31), (30, 32)]# Create a list of 3D lineslines = [[(xs[start], ys[start], zs[start]), (xs[end], ys[end], zs[end])] for start, end in connections]# Plot 2D imageax2d.imshow(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))ax2d.set_title('Pose Estimation')ax2d.axis('off')# Plot 3D landmarks and connectionsax3d.scatter(xs, ys, zs, c='blue', marker='o')ax3d.add_collection3d(Line3DCollection(lines, colors='blue', linewidths=2))ax3d.set_xlim([0, 1])ax3d.set_ylim([0, 1])ax3d.set_zlim([-1, 1])ax3d.set_xlabel('X')ax3d.set_ylabel('Y')ax3d.set_zlabel('Z')ax3d.set_title('3D Pose Landmarks')ani = FuncAnimation(fig, update, interval=10)plt.show()cap.release()
cv2.destroyAllWindows()

在這個代碼中，我們定義了 connections 列表，它包含了骨骼點之間的連接對。然后我們創建了一個 lines 列表，用于存儲這些連接的三維線段，并使用 ax3d.add_collection3d(Line3DCollection(lines, colors='blue', linewidths=2)) 方法將這些線段添加到三維圖中。

運行這個腳本后，三維圖中不僅會顯示骨骼點，還會將這些點連起來，形成完整的骨骼結構。
效果如下：
在這里插入圖片描述
上面的代碼看似三維圖的骨骼是倒立的，你可以調整三維圖的坐標顯示，以使得骨骼結構顯示為正常的人體姿態。可以通過設置三維圖的坐標軸范圍和方向來調整顯示效果。以下是修改后的代碼，調整了坐標軸的范圍和方向，以使骨骼結構正常顯示：

import cv2
import mediapipe as mp
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from mpl_toolkits.mplot3d.art3d import Line3DCollection
import numpy as np
from matplotlib.animation import FuncAnimation# Initialize mediapipe pose class.
mp_pose = mp.solutions.pose
pose = mp_pose.Pose()# Initialize mediapipe drawing class, useful for annotation.
mp_drawing = mp.solutions.drawing_utils# Load the video file.
cap = cv2.VideoCapture('D:/basketball.mp4')# Check if the video is opened successfully.
if not cap.isOpened():print("Error: Could not open video.")exit()fig = plt.figure(figsize=(10, 5))
ax2d = fig.add_subplot(121)
ax3d = fig.add_subplot(122, projection='3d')def update(frame_number):ret, frame = cap.read()if not ret:print("Reached the end of the video.")return# Convert the BGR image to RGB.image_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)# Process the image and detect the pose.result = pose.process(image_rgb)# Clear the previous plotsax2d.clear()ax3d.clear()# Draw the pose annotation on the image.if result.pose_landmarks:mp_drawing.draw_landmarks(frame, result.pose_landmarks, mp_pose.POSE_CONNECTIONS)# Extract the landmark points.landmarks = result.pose_landmarks.landmarkxs = [landmark.x for landmark in landmarks]ys = [landmark.y for landmark in landmarks]zs = [-landmark.z for landmark in landmarks]  # Negate the z-axis for better visualization# Define the connections between landmarksconnections = [(0, 1), (1, 2), (2, 3), (3, 7), (0, 4), (4, 5), (5, 6), (6, 8),(9, 10), (11, 12), (11, 13), (13, 15), (15, 17), (15, 19), (15, 21),(17, 19), (12, 14), (14, 16), (16, 18), (16, 20), (16, 22), (18, 20),(11, 23), (12, 24), (23, 24), (23, 25), (24, 26), (25, 27), (26, 28),(27, 29), (28, 30), (29, 31), (30, 32)]# Create a list of 3D lineslines = [[(xs[start], ys[start], zs[start]), (xs[end], ys[end], zs[end])] for start, end in connections]# Plot 2D imageax2d.imshow(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))ax2d.set_title('Pose Estimation')ax2d.axis('off')# Plot 3D landmarks and connectionsax3d.scatter(xs, ys, zs, c='blue', marker='o')ax3d.add_collection3d(Line3DCollection(lines, colors='blue', linewidths=2))ax3d.set_xlim([0, 1])ax3d.set_ylim([1, 0])  # Flip the y-axis for better visualizationax3d.set_zlim([1, -1])ax3d.set_xlabel('X')ax3d.set_ylabel('Y')ax3d.set_zlabel('Z')ax3d.set_title('3D Pose Landmarks')ani = FuncAnimation(fig, update, interval=10)plt.show()cap.release()
cv2.destroyAllWindows()