openpi 入門教程

系列文章目錄

前言

一、運行要求

二、安裝

三、模型檢查點

3.1 基礎模型

3.2 微調模型

四、運行預訓練模型的推理

五、在自己的數據上微調基礎模型

5.1. 將數據轉換為 LeRobot 數據集

5.3. 啟動策略服務器并運行推理

5.4 更多示例

六、故障排除

七、遠程運行 openpi 模型

7.1 啟動遠程策略服務器

7.2 從機器人代碼中查詢遠程策略服務器

八、推理教程

8.1 策略推斷

8.2 使用實時模型

九、策略記錄代碼

前言

????????openpi 包含物理智能團隊發布的機器人開源模型和軟件包。

????????目前，該 repo 包含兩種模型：

π? 模型，一種基于流的擴散視覺-語言-動作模型 (VLA)
π?-FAST 模型，一種基于 FAST 動作標記器的自回歸 VLA。

????????對于這兩種模型，我們都提供了在 10K+ 小時的機器人數據上預先訓練過的基本模型檢查點，以及用于開箱即用或根據您自己的數據集進行微調的示例。

????????這是一次實驗：π0 是為我們自己的機器人開發的，與 ALOHA 和 DROID 等廣泛使用的平臺不同，盡管我們樂觀地認為，研究人員和從業人員將能夠進行創造性的新實驗，將π0 適應到他們自己的平臺上，但我們并不指望每一次這樣的嘗試都能成功。綜上所述：π0 可能對你有用，也可能對你沒用，但我們歡迎你去試試看！

一、運行要求

????????要運行本資源庫中的模型，您需要至少具備以下規格的英偉達?（NVIDIA?）圖形處理器。這些估算假設使用的是單 GPU，但您也可以通過在訓練配置中配置 fsdp_devices，使用多 GPU 并行模型來減少每個 GPU 的內存需求。還請注意，當前的訓練腳本還不支持多節點訓練。

Mode	Memory Required	Example GPU
Inference	> 8 GB	RTX 4090
Fine-Tuning (LoRA)	> 22.5 GB	RTX 4090
Fine-Tuning (Full)	> 70 GB	A100 (80GB) / H100

????????該軟件包已在 Ubuntu 22.04 上進行了測試，目前不支持其他操作系統。

二、安裝

????????克隆此 repo 時，確保更新子模塊：

git clone --recurse-submodules git@github.com:Physical-Intelligence/openpi.git# Or if you already cloned the repo:
git submodule update --init --recursive

????????我們使用 uv 來管理 Python 的依賴關系。請參閱 uv 安裝說明進行設置。安裝好 uv 后，運行以下命令來設置環境：

GIT_LFS_SKIP_SMUDGE=1 uv sync
GIT_LFS_SKIP_SMUDGE=1 uv pip install -e .

????????注意：需要 GIT_LFS_SKIP_SMUDGE=1 才能將 LeRobot 作為依賴項。

????????Docker 作為 uv 安裝的替代方案，我們提供了使用 Docker 安裝 openpi 的說明。如果遇到系統設置問題，可以考慮使用 Docker 簡化安裝。更多詳情，請參閱 Docker 安裝。

三、模型檢查點

3.1 基礎模型

????????我們提供多個基礎 VLA 模型檢查點。這些檢查點已在 10k+ 小時的機器人數據上進行了預訓練，可用于微調。

Model	Use Case	Description	Checkpoint Path
π0	Fine-Tuning	Base diffusion?π? model?for fine-tuning	`s3://openpi-assets/checkpoints/pi0_base`
π0-FAST	Fine-Tuning	Base autoregressive?π?-FAST model?for fine-tuning	`s3://openpi-assets/checkpoints/pi0_fast_base`

3.2 微調模型

????????我們還為各種機器人平臺和任務提供 “專家 ”檢查點。這些模型在上述基礎模型的基礎上進行了微調，旨在直接在目標機器人上運行。這些模型不一定適用于您的特定機器人。由于這些檢查點是在使用 ALOHA 和 DROID Franka 等更廣泛使用的機器人收集的相對較小的數據集上進行微調的，因此它們可能無法適用于您的特定設置，不過我們發現其中一些檢查點，尤其是 DROID 檢查點，在實踐中具有相當廣泛的適用性。

Model	Use Case	Description	Checkpoint Path
π0-FAST-DROID	Inference	π0-FAST model fine-tuned on the?DROID dataset, can perform a wide range of simple table-top manipulation tasks 0-shot in new scenes on the DROID robot platform	`s3://openpi-assets/checkpoints/pi0_fast_droid`
π0-DROID	Fine-Tuning	π0?model fine-tuned on the?DROID dataset, faster inference than?π0-FAST-DROID, but may not follow language commands as well	`s3://openpi-assets/checkpoints/pi0_droid`
π0-ALOHA-towel	Inference	π0?model fine-tuned on internal ALOHA data, can fold diverse towels 0-shot on?ALOHA?robot platforms	`s3://openpi-assets/checkpoints/pi0_aloha_towel`
π0-ALOHA-tupperware	Inference	π0?model fine-tuned on internal ALOHA data, can unpack food from a tupperware container	`s3://openpi-assets/checkpoints/pi0_aloha_tupperware`
π0-ALOHA-pen-uncap	Inference	π0?model fine-tuned on?public ALOHA data, can uncap a pen	`s3://openpi-assets/checkpoints/pi0_aloha_pen_uncap`

????????默認情況下，檢查點會自動從 s3://openpi-assets 下載，并在需要時緩存到 ~/.cache/openpi 中。你可以通過設置 OPENPI_DATA_HOME 環境變量來覆蓋下載路徑。

四、運行預訓練模型的推理

????????我們的預訓練模型檢查點只需幾行代碼即可運行（此處為我們的 π0-FAST-DROID 模型）：

from openpi.training import config
from openpi.policies import policy_config
from openpi.shared import downloadconfig = config.get_config("pi0_fast_droid")
checkpoint_dir = download.maybe_download("s3://openpi-assets/checkpoints/pi0_fast_droid")# Create a trained policy.
policy = policy_config.create_trained_policy(config, checkpoint_dir)# Run inference on a dummy example.
example = {"observation/exterior_image_1_left": ...,"observation/wrist_image_left": ...,..."prompt": "pick up the fork"
}
action_chunk = policy.infer(example)["actions"]

????????您也可以在示例筆記本中進行測試。

????????我們提供了在 DROID 和 ALOHA 機器人上運行預訓練檢查點推理的詳細分步示例。

遠程推理：我們提供了遠程運行模型推理的示例和代碼：模型可以在不同的服務器上運行，并通過 websocket 連接向機器人發送動作流。這樣就可以輕松地在機器人外使用更強大的 GPU，并將機器人和策略環境分開。
在沒有機器人的情況下測試推理：我們提供了一個腳本，用于在沒有機器人的情況下測試推理。該腳本將生成隨機觀測數據，并使用模型運行推理。更多詳情，請參閱此處。

五、在自己的數據上微調基礎模型

????????我們將在 Libero 數據集上微調 π0-FAST 模型，作為如何在自己的數據上微調基礎模型的運行示例。我們將解釋三個步驟：

將您的數據轉換為 LeRobot 數據集（我們使用該數據集進行訓練）
定義訓練配置并運行訓練
啟動策略服務器并運行推理

5.1. 將數據轉換為 LeRobot 數據集

????????我們在 examples/libero/convert_libero_data_to_lerobot.py 中提供了將 Libero 數據轉換為 LeRobot 數據集的最小示例腳本。您可以輕松修改它，轉換自己的數據！您可以從這里下載原始的 Libero 數據集，并使用以下命令運行腳本：

uv run examples/libero/convert_libero_data_to_lerobot.py --data_dir /path/to/your/libero/data

5.2. 定義訓練配置和運行訓練

????????要在自己的數據上對基礎模型進行微調，您需要定義用于數據處理和訓練的配置。下面我們提供了帶有詳細注釋的 Libero 配置示例，您可以根據自己的數據集進行修改：

LiberoInputs 和 LiberoOutputs：定義從 Libero 環境到模型的數據映射，反之亦然。將用于訓練和推理。
LeRobotLiberoDataConfig：定義如何處理 LeRobot 數據集中用于訓練的 Libero 原始數據。
TrainConfig：訓練配置：定義微調超參數、數據配置和權重加載器。

????????我們提供了π?和π?-FAST 在 Libero 數據上的微調配置示例。

????????在運行訓練之前，我們需要計算訓練數據的歸一化統計量。使用訓練配置的名稱運行下面的腳本：

uv run scripts/compute_norm_stats.py --config-name pi0_fast_libero

????????現在，我們可以使用以下命令啟動訓練（如果使用相同配置重新運行微調，則 --overwrite 標志用于覆蓋現有檢查點）：

XLA_PYTHON_CLIENT_MEM_FRACTION=0.9 uv run scripts/train.py pi0_fast_libero --exp-name=my_experiment --overwrite

????????該命令會將訓練進度記錄到控制臺，并將檢查點保存到檢查點目錄。您還可以在權重與偏差儀表板上監控訓練進度。為了最大限度地使用 GPU 內存，請在運行訓練之前設置 XLA_PYTHON_CLIENT_MEM_FRACTION=0.9 -- 這將使 JAX 能夠使用高達 90% 的 GPU 內存（默認值為 75%）。

注：我們提供了從預訓練開始重新加載狀態/動作歸一化統計數據的功能。如果您要對預訓練混合物中的機器人新任務進行微調，這將非常有用。有關如何重新加載歸一化統計數據的詳細信息，請參閱 norm_stats.md 文件。

5.3. 啟動策略服務器并運行推理

????????訓練完成后，我們就可以啟動策略服務器，然后通過 Libero 評估腳本進行查詢，從而運行推理。啟動模型服務器非常簡單（本例使用迭代 20,000 的檢查點，可根據需要修改）：

uv run scripts/serve_policy.py policy:checkpoint --policy.config=pi0_fast_libero --policy.dir=checkpoints/pi0_fast_libero/my_experiment/20000

????????這將啟動一個服務器，該服務器監聽 8000 端口，并等待向其發送觀察結果。然后，我們就可以運行 Libero 評估腳本來查詢服務器。有關如何安裝 Libero 和運行評估腳本的說明，請參閱 Libero README。

????????如果你想在自己的機器人運行時中嵌入策略服務器調用，我們在遠程推理文檔中提供了一個最簡單的示例。

5.4 更多示例

????????我們在以下 READMEs 中提供了更多示例，說明如何在 ALOHA 平臺上使用我們的模型進行微調和推理：

ALOHA 模擬器
ALOHA 真實
UR5

六、故障排除

????????我們將在此收集常見問題及其解決方案。如果遇到問題，請先查看此處。如果找不到解決方案，請在軟件倉庫中提交問題（參見此處的指導原則）。

Issue	Resolution
`uv 同步因依賴關系沖突而失敗`	嘗試刪除虛擬環境目錄（rm -rf .venv）并重新運行 uv 同步。如果問題仍然存在，請檢查是否安裝了最新版本的 uv（uv self update）。
訓練耗盡 GPU 內存	確保在運行訓練之前設置 XLA_PYTHON_CLIENT_MEM_FRACTION=0.9，以允許 JAX 使用更多 GPU 內存。您也可以嘗試在訓練配置中減少批量大小。
策略服務器連接錯誤	檢查服務器是否正在運行，是否在預期端口上監聽。驗證客戶端和服務器之間的網絡連接和防火墻設置。
訓練時缺失常模統計錯誤	在開始訓練前使用配置名稱運行 scripts/compute_norm_stats.py。
數據集下載失敗	檢查網絡連接。如果使用 local_files_only=True，請確認數據集是否存在于本地。對于 HuggingFace 數據集，請確保已登錄（huggingface-cli 登錄）。
CUDA/GPU 錯誤	驗證英偉達驅動程序和 CUDA 工具包是否安裝正確。對于 Docker，確保已安裝 nvidia-container-toolkit。檢查 GPU 兼容性。
運行示例時出現導入錯誤	確保使用 uv sync 安裝了所有依賴項并激活了虛擬環境。某些示例的 READMEs 中可能列出了其他要求。
動作尺寸不匹配	驗證您的數據處理轉換是否與機器人的預期輸入/輸出尺寸相匹配。檢查策略類中的動作空間定義。

七、遠程運行 openpi 模型

????????我們提供了遠程運行 openpi 模型的實用程序。這對于在機器人外更強大的 GPU 上運行推理非常有用，還有助于將機器人環境和策略環境分開（例如，避免機器人軟件的依賴性地獄）。

7.1 啟動遠程策略服務器

????????要啟動遠程策略服務器，只需運行以下命令即可：

uv run scripts/serve_policy.py --env=[DROID | ALOHA | LIBERO]

????????env 參數指定應加載哪個 π0 檢查點。在腳本引擎蓋下，該腳本將執行類似下面的命令，你可以用它來啟動策略服務器，例如為你自己訓練的檢查點啟動策略服務器（這里以 DROID 環境為例）：

uv run scripts/serve_policy.py policy:checkpoint --policy.config=pi0_fast_droid --policy.dir=s3://openpi-assets/checkpoints/pi0_fast_droid

????????這將啟動一個策略服務器，為 config 和 dir 參數指定的策略提供服務。策略將通過指定端口（默認：8000）提供。

7.2 從機器人代碼中查詢遠程策略服務器

????????我們提供的客戶端實用程序依賴性極低，您可以輕松將其嵌入到任何機器人代碼庫中。

????????首先，在機器人環境中安裝 openpi-client 軟件包：

cd $OPENPI_ROOT/packages/openpi-client
pip install -e .

????????然后，您就可以使用客戶端從機器人代碼中查詢遠程策略服務器。下面舉例說明如何做到這一點：

from openpi_client import image_tools
from openpi_client import websocket_client_policy# Outside of episode loop, initialize the policy client.
# Point to the host and port of the policy server (localhost and 8000 are the defaults).
client = websocket_client_policy.WebsocketClientPolicy(host="localhost", port=8000)for step in range(num_steps):# Inside the episode loop, construct the observation.# Resize images on the client side to minimize bandwidth / latency. Always return images in uint8 format.# We provide utilities for resizing images + uint8 conversion so you match the training routines.# The typical resize_size for pre-trained pi0 models is 224.# Note that the proprioceptive `state` can be passed unnormalized, normalization will be handled on the server side.observation = {"observation/image": image_tools.convert_to_uint8(image_tools.resize_with_pad(img, 224, 224)),"observation/wrist_image": image_tools.convert_to_uint8(image_tools.resize_with_pad(wrist_img, 224, 224)),"observation/state": state,"prompt": task_instruction,}# Call the policy server with the current observation.# This returns an action chunk of shape (action_horizon, action_dim).# Note that you typically only need to call the policy every N steps and execute steps# from the predicted action chunk open-loop in the remaining steps.action_chunk = client.infer(observation)["actions"]# Execute the actions in the environment....

????????這里，主機和端口參數指定了遠程策略服務器的 IP 地址和端口。您也可以將這些參數指定為機器人代碼的命令行參數，或在機器人代碼庫中硬編碼。觀察結果是觀察結果和提示的字典，與您所服務的策略的策略輸入相一致。在簡單的客戶端示例中，我們提供了如何在不同環境下構建該字典的具體示例。

八、推理教程

import dataclassesimport jaxfrom openpi.models import model as _model
from openpi.policies import droid_policy
from openpi.policies import policy_config as _policy_config
from openpi.shared import download
from openpi.training import config as _config
from openpi.training import data_loader as _data_loader

8.1 策略推斷

????????下面的示例展示了如何從檢查點創建策略，并在虛擬示例上運行推理。

config = _config.get_config("pi0_fast_droid")
checkpoint_dir = download.maybe_download("s3://openpi-assets/checkpoints/pi0_fast_droid")# Create a trained policy.
policy = _policy_config.create_trained_policy(config, checkpoint_dir)# Run inference on a dummy example. This example corresponds to observations produced by the DROID runtime.
example = droid_policy.make_droid_example()
result = policy.infer(example)# Delete the policy to free up memory.
del policyprint("Actions shape:", result["actions"].shape)

8.2 使用實時模型

????????下面的示例展示了如何從檢查點創建實時模型并計算訓練損失。首先，我們將演示如何使用假數據。

config = _config.get_config("pi0_aloha_sim")checkpoint_dir = download.maybe_download("s3://openpi-assets/checkpoints/pi0_aloha_sim")
key = jax.random.key(0)# Create a model from the checkpoint.
model = config.model.load(_model.restore_params(checkpoint_dir / "params"))# We can create fake observations and actions to test the model.
obs, act = config.model.fake_obs(), config.model.fake_act()# Sample actions from the model.
loss = model.compute_loss(key, obs, act)
print("Loss shape:", loss.shape)

????????現在，我們將創建一個數據加載器，并使用一批真實的訓練數據來計算損失。

# Reduce the batch size to reduce memory usage.
config = dataclasses.replace(config, batch_size=2)# Load a single batch of data. This is the same data that will be used during training.
# NOTE: In order to make this example self-contained, we are skipping the normalization step
# since it requires the normalization statistics to be generated using `compute_norm_stats`.
loader = _data_loader.create_data_loader(config, num_batches=1, skip_norm_stats=True)
obs, act = next(iter(loader))# Sample actions from the model.
loss = model.compute_loss(key, obs, act)# Delete the model to free up memory.
del modelprint("Loss shape:", loss.shape)

九、策略記錄代碼

import pathlibimport numpy as nprecord_path = pathlib.Path("../policy_records")
num_steps = len(list(record_path.glob("step_*.npy")))records = []
for i in range(num_steps):record = np.load(record_path / f"step_{i}.npy", allow_pickle=True).item()records.append(record)

print("length of records", len(records))
print("keys in records", records[0].keys())for k in records[0]:print(f"{k} shape: {records[0][k].shape}")

from PIL import Imagedef get_image(step: int, idx: int = 0):img = (255 * records[step]["inputs/image"]).astype(np.uint8)return img[idx].transpose(1, 2, 0)def show_image(step: int, idx_lst: list[int]):imgs = [get_image(step, idx) for idx in idx_lst]return Image.fromarray(np.hstack(imgs))for i in range(2):display(show_image(i, [0])

import pandas as pddef get_axis(name, axis):return np.array([record[name][axis] for record in records])# qpos is [..., 14] of type float:
# 0-5: left arm joint angles
# 6: left arm gripper
# 7-12: right arm joint angles
# 13: right arm gripper
names = [("left_joint", 6), ("left_gripper", 1), ("right_joint", 6), ("right_gripper", 1)]def make_data():cur_dim = 0in_data = {}out_data = {}for name, dim_size in names:for i in range(dim_size):in_data[f"{name}_{i}"] = get_axis("inputs/qpos", cur_dim)out_data[f"{name}_{i}"] = get_axis("outputs/qpos", cur_dim)cur_dim += 1return pd.DataFrame(in_data), pd.DataFrame(out_data)in_data, out_data = make_data()

for name in in_data.columns:data = pd.DataFrame({f"in_{name}": in_data[name], f"out_{name}": out_data[name]})data.plot()