【機器人】復現 DOV-SG 機器人導航 | 動態開放詞匯 | 3D 場景圖

DOV-SG 建了動態 3D 場景圖,并使用LLM大型語言模型進行任務分解,從而能夠在交互式探索過程中對 3D 場景圖進行局部更新。

來自RA-L 2025,適合長時間的 語言引導移動操作,動態開放詞匯 3D 場景圖。

論文地址:Dynamic Open-Vocabulary 3D Scene Graphs for Long-term Language-Guided Mobile Manipulation

代碼地址:https://github.com/BJHYZJ/DovSG

本文分享DOV-SG復現和模型推理的過程~

下面是一個導航示例:

導航過程:(綠色點是當前位置 ,紅色點目標位置,紫紅色是導航軌跡)

1、創建Conda環境

首先創建一個Conda環境,名字為dovsg,python版本為3.9,然后進入dovsg環境

對于的兩行執行命令:

conda create -n dovsg python=3.9 -y
conda activate dovsg

然后下載代碼,進入代碼工程:https://github.com/BJHYZJ/DovSG.git

git clone https://github.com/BJHYZJ/DovSG.git
cd DovSG

成功后如下圖所示:

2、安裝 PyTorch

使用 torch==2.3.1?、cuda-12.1 的版本進行安裝,執行下面命令:

pip install torch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 --index-url https://download.pytorch.org/whl/cu121  

等待安裝完成,打印信息:

Successfully installed MarkupSafe-2.1.5 filelock-3.13.1 fsspec-2024.6.1 jinja2-3.1.4 mpmath-1.3.0 networkx-3.2.1 numpy-1.26.3 nvidia-cublas-cu12-12.1.3.1 nvidia-cuda-cupti-cu12-12.1.105 nvidia-cuda-nvrtc-cu12-12.1.105 nvidia-cuda-runtime-cu12-12.1.105 nvidia-cudnn-cu12-8.9.2.26 nvidia-cufft-cu12-11.0.2.54 nvidia-curand-cu12-10.3.2.106 nvidia-cusolver-cu12-11.4.5.107 nvidia-cusparse-cu12-12.1.0.106 nvidia-nccl-cu12-2.20.5 nvidia-nvjitlink-cu12-12.1.105 nvidia-nvtx-cu12-12.1.105 pillow-11.0.0 sympy-1.13.3 torch-2.3.1+cu121 torchaudio-2.3.1+cu121 torchvision-0.18.1+cu121 triton-2.3.1 typing-extensions-4.12.2

3、安裝 Segment-Anything-2

這些需要指定 segment-anything-2的代碼版本為 '7e1596c',兼容后面其他依賴庫

執行下面命令:

cd third_party
git clone https://github.com/facebookresearch/sam2.git segment-anything-2
cd segment-anything-2
git checkout 7e1596c

運行過程打印信息:

然后修改 setup.py 代碼,有兩處需要修改的

# line 27: "numpy>=1.24.4" ==> "numpy>=1.23.0",


# line 144: python_requires=">=3.10.0" ==> python_requires=">=3.9.0"

再進行安裝segment-anything-2,執行下面命令:

pip install -e ".[demo]" 

等待安裝完成~

  Attempting uninstall: SAM-2Found existing installation: SAM-2 1.0Uninstalling SAM-2-1.0:Successfully uninstalled SAM-2-1.0
Successfully installed SAM-2-1.0 anyio-4.9.0 argon2-cffi-25.1.0 argon2-cffi-bindings-21.2.0 
arrow-1.3.0 asttokens-3.0.0 async-lru-2.0.5 attrs-25.3.0 babel-2.17.0 beautifulsoup4-4.13.4 
bleach-6.2.0 certifi-2025.6.15 cffi-1.17.1 charset_normalizer-3.4.2 comm-0.2.2 contourpy-1.3.0 cycler-0.12.1 
debugpy-1.8.14 decorator-5.2.1 defusedxml-0.7.1 exceptiongroup-1.3.0 executing-2.2.0 fastjsonschema-2.21.1 
fonttools-4.58.4 fqdn-1.5.1 h11-0.16.0 httpcore-1.0.9 httpx-0.28.1 idna-3.10 importlib-metadata-8.7.0 
importlib-resources-6.5.2 ipykernel-6.29.5 ipython-8.18.1 ipywidgets-8.1.7 isoduration-20.11.0 
jedi-0.19.2 json5-0.12.0 jsonpointer-3.0.0 jsonschema-4.24.0 jsonschema-specifications-2025.4.1 
jupyter-1.1.1 jupyter-client-8.6.3 jupyter-console-6.6.3 jupyter-core-5.8.1 jupyter-events-0.12.0 
jupyter-lsp-2.2.5 jupyter-server-2.16.0 jupyter-server-terminals-0.5.3 jupyterlab-4.4.4 jupyterlab-pygments-0.3.0 jupyterlab-server-2.27.3 jupyterlab_widgets-3.0.15 kiwisolver-1.4.7 
matplotlib-3.9.4 matplotlib-inline-0.1.7 mistune-3.1.3 nbclient-0.10.2 nbconvert-7.16.6 nbformat-5.10.4 nest-asyncio-1.6.0 notebook-7.4.4 notebook-shim-0.2.4 opencv-python-4.11.0.86 overrides-7.7.0 pandocfilters-1.5.1 parso-0.8.4 pexpect-4.9.0 platformdirs-4.3.8 
prometheus-client-0.22.1 prompt-toolkit-3.0.51 psutil-7.0.0 ptyprocess-0.7.0 pure-eval-0.2.3 
pycparser-2.22 pygments-2.19.2 pyparsing-3.2.3 python-dateutil-2.9.0.post0 python-json-logger-3.3.0 
pyzmq-27.0.0 referencing-0.36.2 requests-2.32.4 rfc3339-validator-0.1.4 rfc3986-validator-0.1.1 
rpds-py-0.25.1 send2trash-1.8.3 six-1.17.0 sniffio-1.3.1 soupsieve-2.7 stack-data-0.6.3 terminado-0.18.1 
tinycss2-1.4.0 tomli-2.2.1 tornado-6.5.1 traitlets-5.14.3 types-python-dateutil-2.9.0.20250516 
uri-template-1.3.0 urllib3-2.5.0 wcwidth-0.2.13 webcolors-24.11.1 webencodings-0.5.1 websocket-client-1.8.0 widgetsnbextension-4.0.14 zipp-3.23.0

4、安裝?GroundingDINO

這些需要指定 GroundingDINO 的代碼版本為 '856dde2',兼容后面其他依賴庫

執行下面命令:

cd ..
git clone https://github.com/IDEA-Research/GroundingDINO.git GroundingDINO
cd GroundingDINO/
git checkout 856dde2

運行過程打印信息:

再進行安裝GroundingDINO,執行下面命令:

pip install -e . 

等待安裝完成~

5、安裝?RAM & Tag2Text

這些需要指定 recognize-anything 的代碼版本為 '88c2b0c',兼容后面其他依賴庫

執行下面命令:

cd ..
git clone https://github.com/xinyu1205/recognize-anything.git
cd recognize-anything/
git checkout 88c2b0c

再分別執行下面命令,進行安裝:

pip install -r requirements.txt
pip install -e .

運行過程打印信息:

等待安裝完成~

6、安裝 ACE

執行下面命令:

cd ../../ace/dsacstar/
conda install opencv
python setup.py install

等待安裝完成~

(dovsg) lgp@lgp-MS-7E07:~/2025_project/DovSG/ace/dsacstar$ python setup.py install ? ? ? ? ? ?
Detected active conda environment: /home/lgp/anaconda3/envs/dovsg ? ? ? ? ? ? ? ? ? ? ? ? ? ??
Assuming OpenCV dependencies in: ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
........

........

creating dist
creating 'dist/dsacstar-0.0.0-py3.9-linux-x86_64.egg' and adding 'build/bdist.linux-x86_64/egg' to it
removing 'build/bdist.linux-x86_64/egg' (and everything under it)
Processing dsacstar-0.0.0-py3.9-linux-x86_64.egg
creating /home/lgp/anaconda3/envs/dovsg/lib/python3.9/site-packages/dsacstar-0.0.0-py3.9-linux-x86_64.egg
Extracting dsacstar-0.0.0-py3.9-linux-x86_64.egg to /home/lgp/anaconda3/envs/dovsg/lib/python3.9/site-packages
Adding dsacstar 0.0.0 to easy-install.pth file

Installed /home/lgp/anaconda3/envs/dovsg/lib/python3.9/site-packages/dsacstar-0.0.0-py3.9-linux-x86_64.egg
Processing dependencies for dsacstar==0.0.0
Finished processing dependencies for dsacstar==0.0.0
(dovsg) lgp@lgp-MS-7E07:~/2025_project/DovSG/ace/dsacstar$?
(dovsg) lgp@lgp-MS-7E07:~/2025_project/DovSG/ace/dsacstar$?

7、安裝?LightGlue

這些需要指定 LightGlue 的代碼版本為 'edb2b83',兼容后面其他依賴庫

執行下面命令:

cd ../../third_party/
git clone https://github.com/cvg/LightGlue.git
cd LightGlue/
git checkout edb2b83
python -m pip install -e .

等待安裝完成~

(dovsg) lgp@lgp-MS-7E07:~/2025_project/DovSG/third_party$ git clone https://github.com/cvg/LightGlue.git
正克隆到 'LightGlue'...
remote: Enumerating objects: 386, done.
remote: Counting objects: 100% (205/205), done.
remote: Compressing objects: 100% (119/119), done.
remote: Total 386 (delta 147), reused 86 (delta 86), pack-reused 181 (from 2)
接收對象中: 100% (386/386), 17.43 MiB | 13.39 MiB/s, 完成.
處理 delta 中: 100% (236/236), 完成.
(dovsg) lgp@lgp-MS-7E07:~/2025_project/DovSG/third_party$ ls
DROID-SLAM ?GroundingDINO ?LightGlue ?pytorch3d ?recognize-anything ?segment-anything-2
(dovsg) lgp@lgp-MS-7E07:~/2025_project/DovSG/third_party$?
(dovsg) lgp@lgp-MS-7E07:~/2025_project/DovSG/third_party$ cd LightGlue/
(dovsg) lgp@lgp-MS-7E07:~/2025_project/DovSG/third_party/LightGlue$?
(dovsg) lgp@lgp-MS-7E07:~/2025_project/DovSG/third_party/LightGlue$ git checkout edb2b83
注意:正在切換到 'edb2b83'。

..............................

HEAD 目前位于 edb2b83 fix compilation for torch v2.2.1 (#124)
(dovsg) lgp@lgp-MS-7E07:~/2025_project/DovSG/third_party/LightGlue$?

(dovsg) lgp@lgp-MS-7E07:~/2025_project/DovSG/third_party/LightGlue$ python -m pip install -e .
Obtaining file:///home/lgp/2025_project/DovSG/third_party/LightGlue
? Installing build dependencies ... done
? Checking if build backend supports build_editable ... done

................

Successfully built lightglue
Installing collected packages: kornia_rs, kornia, lightglue
Successfully installed kornia-0.8.1 kornia_rs-0.1.9 lightglue-0.0

再安裝 Faiss庫:

conda install -c pytorch faiss-cpu=1.7.4 mkl=2021 blas=1.0=mkl

8、安裝 PyTorch3d

這些需要指定 PyTorch3d 的代碼版本為 '05cbea1',兼容后面其他依賴庫

執行下面命令:

cd ..
git clone https://github.com/facebookresearch/pytorch3d.git                                        
cd pytorch3d/
git checkout 05cbea1
python setup.py install

等待安裝完成~

(dovsg) lgp@lgp-MS-7E07:~/2025_project/DovSG/third_party/pytorch3d$?
(dovsg) lgp@lgp-MS-7E07:~/2025_project/DovSG/third_party/pytorch3d$ python setup.py install
......................

Using /home/lgp/anaconda3/envs/dovsg/lib/python3.9/site-packages
Finished processing dependencies for pytorch3d==0.7.7
(dovsg) lgp@lgp-MS-7E07:~/2025_project/DovSG/third_party/pytorch3d$?

9、安裝其他依賴包和dovsg庫

首先安裝一些依賴包,執行下面命令:

cd ../../
pip install ipython cmake pybind11 ninja scipy==1.10.1 scikit-learn==1.4.0 pandas==2.0.3 hydra-core opencv-python openai-clip timm matplotlib==3.7.2 imageio timm open3d numpy-quaternion more-itertools pyliblzfse einops transformers pytorch-lightning wget gdown tqdm zmq torch_geometric numpy==1.23.0  # -i https://pypi.tuna.tsinghua.edu.cn/simple

再安裝?protobuf、MinkowskiEngine 、graspnet api

pip install protobuf==3.19.0
pip install git+https://github.com/pccws/MinkowskiEngine
pip install graspnetAPI

還需要安裝 torch-cluster(先用wget下載xx.whl文件到本地,在用pip進行安裝)

wget https://data.pyg.org/whl/torch-2.3.0%2Bcu121/torch_cluster-1.6.3%2Bpt23cu121-cp39-cp39-linux_x86_64.whlpip install torch_cluster-1.6.3+pt23cu121-cp39-cp39-linux_x86_64.whl

安裝一些依賴包,執行下面命令:

pip install numpy==1.23.0 supervision==0.14.0 shapely alphashape 
pip install pyrealsense2 open_clip_torch graphviz pyrender
pip install openai==1.56.1
pip install transforms3d==0.3.1 scikit-image==0.19.3

最后安裝?dovsg:

pip install -e .

等待安裝完成~

(dovsg) lgp@lgp-MS-7E07:~/2025_project/DovSG$ pip install -e .
Obtaining file:///home/lgp/2025_project/DovSG
? Preparing metadata (setup.py) ... done
Installing collected packages: dovsg
? Running setup.py develop for dovsg

Successfully installed dovsg
(dovsg) lgp@lgp-MS-7E07:~/2025_project/DovSG$?

補丁2025/7/4:可視化需要 graphviz

sudo apt-get install graphviz
conda install -c conda-forge graphviz python-graphviz

10、安裝 DROID-SLAM

這里的 DROID-SLAM 和?DOV-SG 需要分割開,創建一個新的Conda環境進行搭建。

這些需要指定 DROID-SLAM 的代碼版本為 8016d2b,兼容其他依賴庫,執行下面命令:

cd ./third_party/
git clone https://github.com/princeton-vl/DROID-SLAM.git
cd DROID-SLAM/
git checkout 8016d2b

等待下載完成~

在DROID-SLAM/thirdparty/中需要存放:eigen、lietorch、tartanair_tools等依賴庫,需要執行:

git submodule update --init thirdparty/lietorch

這樣拉取并初始化所有子模塊,在 DROID-SLAM 根目錄下執行上面命令,這樣會把 thirdparty/lietorch 等子模塊都拉下來。

1、創建Conda環境

首先創建一個Conda環境,名字為droidenv,python版本為3.9,然后進入droidenv環境

對于的兩行執行命令:

conda create -n droidenv python=3.9 -y
conda activate droidenv

2、安裝PyTorch

conda install pytorch=1.10 torchvision torchaudio cudatoolkit=11.3 -c pytorch -y

3、安裝依賴包

conda install suitesparse -c conda-forge -y
pip install open3d==0.15.2 scipy opencv-python==4.7.0.72 matplotlib pyyaml==6.0.2 tensorboard # -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install evo --upgrade --no-binary evo
pip install gdown
pip install numpy==1.23.0 numpy-quaternion==2023.0.4

等待下載完成~

4、安裝torch-sactter

wget https://data.pyg.org/whl/torch-1.10.0%2Bcu113/torch_scatter-2.0.9-cp39-cp39-linux_x86_64.whl
pip install torch_scatter-2.0.9-cp39-cp39-linux_x86_64.whl

5、安裝?DROID-SLAM

?配置使用gcc-10/g++10

sudo apt install gcc-10 g++-10
export CC=/usr/bin/gcc-10
export CXX=/usr/bin/g++-10

系統默認 CUDA 12.1,臨時切換為 CUDA 11.3

注意:臨時切換只在當前 shell session 生效,關閉終端后恢復原狀態(CUDA 12.1)

export CUDA_HOME=/usr/local/cuda-11.3
export PATH=$CUDA_HOME/bin:$PATH
export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH

(droidenv) lgp@lgp-MS-7E07:~/2025_project/DovSG/third_party/DROID-SLAM$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Mar_21_19:15:46_PDT_2021
Cuda compilation tools, release 11.3, V11.3.58
Build cuda_11.3.r11.3/compiler.29745058_0

進行安裝DROID-SLAM:

python setup.py install

等待編譯完成~

11、下載模型權重

在項目中,一共使用了7個模型(有些太多了),各個模型的版本及下載鏈接/方法如下:

  1. anygrasp: when you get anygrasp license from?here, it will provid checkpoint for you.
  2. bert-base-uncased:?https://huggingface.co/google-bert/bert-base-uncased
  3. CLIP-ViT-H-14-laion2B-s32B-b79K:?https://huggingface.co/laion/CLIP-ViT-H-14-laion2B-s32B-b79K
  4. droid-slam:?https://drive.google.com/file/u/0/d/1PpqVt1H4maBa_GbPJp4NwxRsd9jk-elh/view?usp=sharing&pli=1
  5. GroundingDINO:?https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth?and?https://github.com/IDEA-Research/GroundingDINO/blob/main/groundingdino/config/GroundingDINO_SwinT_OGC.py
  6. recognize_anything:?https://huggingface.co/spaces/xinyu1205/Recognize_Anything-Tag2Text/blob/main/ram_swin_large_14m.pth
  7. segment-anything-2:?https://github.com/facebookresearch/sam2?tab=readme-ov-file#download-checkpoints

模型權重存放的路徑:

DovSG/
? ? ├── checkpoints
? ? │ ? ├── anygrasp
? ? │ ? │ ? ├── checkpoint_detection.tar
? ? │ ? │ ? └── checkpoint_tracking.tar
? ? │ ? ├── bert-base-uncased
? ? │ ? │ ? ├── config.json
? ? │ ? │ ? ├── model.safetensors
? ? │ ? │ ? ├── tokenizer_config.json
? ? │ ? │ ? ├── tokenizer.json
? ? │ ? │ ? └── vocab.txt
? ? │ ? ├── CLIP-ViT-H-14-laion2B-s32B-b79K
? ? │ ? │ ? └── open_clip_pytorch_model.bin
? ? │ ? ├── droid-slam
? ? │ ? │ ? └── droid.pth
? ? │ ? ├── GroundingDINO
? ? │ ? │ ? ├── groundingdino_swint_ogc.pth
? ? │ ? │ ? └── GroundingDINO_SwinT_OGC.py
? ? │ ? ├── recognize_anything
? ? │ ? │ ? └── ram_swin_large_14m.pth
? ? │ ? └── segment-anything-2
? ? │ ? ? ? └── sam2_hiera_large.pt
? ? └── license
? ? ? ? ├── licenseCfg.json
? ? ? ? ├── ZhijieYan.lic
? ? ? ? ├── ZhijieYan.public_key
? ? ? ? └── ZhijieYan.signature
? ? ... ?

下載大模型的權重需要:

需要在本地安裝 Git LFS 工具(用于處理大文件):

sudo apt-get install git-lfs

安裝后,在終端執行以下命令啟用 LFS 支持:

git lfs install

2、bert-base-uncased 權重

執行下面命令,進行下載:

mkdir checkpoints
cd checkpoints/
git clone https://huggingface.co/google-bert/bert-base-uncased

等待下載完成~

(base) lgp@lgp-MS-7E07:~/2025_project/DovSG$ mkdir checkpoints
(base) lgp@lgp-MS-7E07:~/2025_project/DovSG$ cd checkpoints/
(base) lgp@lgp-MS-7E07:~/2025_project/DovSG/checkpoints$ git clone https://huggingface.co/google-bert/bert-base-uncased
正克隆到 'bert-base-uncased'...
remote: Enumerating objects: 85, done.
remote: Total 85 (delta 0), reused 0 (delta 0), pack-reused 85 (from 1)
展開對象中: 100% (85/85), 330.58 KiB | 912.00 KiB/s, 完成.
(base) lgp@lgp-MS-7E07:~/2025_project/DovSG/checkpoints$ ls
bert-base-uncased

?拉取對應的權重:

cd bert-base-uncased
git lfs pull

3、CLIP-ViT-H-14-laion2B-s32B-b79K 權重

執行下面命令,進行下載:

cd ../
git clone https://huggingface.co/laion/CLIP-ViT-H-14-laion2B-s32B-b79K

等待下載完成~

(base) lgp@lgp-MS-7E07:~/2025_project/DovSG/checkpoints$?
(base) lgp@lgp-MS-7E07:~/2025_project/DovSG/checkpoints$ git clone https://huggingface.co/laion/CLIP-ViT-H-14-laion2B-s32B-b79K
正克隆到 'CLIP-ViT-H-14-laion2B-s32B-b79K'...
remote: Enumerating objects: 47, done.
remote: Counting objects: 100% (8/8), done.
remote: Compressing objects: 100% (8/8), done.
remote: Total 47 (delta 2), reused 0 (delta 0), pack-reused 39 (from 1)
展開對象中: 100% (47/47), 1.08 MiB | 1.64 MiB/s, 完成.
(base) lgp@lgp-MS-7E07:~/2025_project/DovSG/checkpoints$ ls
bert-base-uncased ?CLIP-ViT-H-14-laion2B-s32B-b79K

拉取對應的權重:

cd CLIP-ViT-H-14-laion2B-s32B-b79K
git lfs pull

4、droid-slam 、GroundingDINO、recognize_anything、segment-anything-2 權重

執行下面命令,創建不同權重的文件夾:

cd ../
mkdir droid-slam
mkdir GroundingDINO
mkdir recognize_anything
mkdir segment-anything-2

這些權重只能在網頁下載后,復制到對應文件夾中

  1. ?droid-slam:?https://drive.google.com/file/u/0/d/1PpqVt1H4maBa_GbPJp4NwxRsd9jk-elh/view?usp=sharing&pli=1
  2. GroundingDINO:?https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth?and?https://github.com/IDEA-Research/GroundingDINO/blob/main/groundingdino/config/GroundingDINO_SwinT_OGC.py
  3. recognize_anything:?https://huggingface.co/spaces/xinyu1205/Recognize_Anything-Tag2Text/blob/main/ram_swin_large_14m.pth
  4. segment-anything-2:?https://github.com/facebookresearch/sam2?tab=readme-ov-file#download-checkpoints

12、下載數據集

數據集下載地址:https://drive.google.com/drive/folders/13v5QOrqjxye__kJwDIuD7kTdeSSNfR5x

下載后解壓到DovSG主目錄中,生成一個data_example目錄;

備注:poses_droidslam是后面運行生成的,這里忽略~

13、使用DROID-SLAM進行姿勢估計

激活 Conda 環境,執行下面命令:

conda deactivate 
conda activate droidenv

修改代碼:third_party/DROID-SLAM/droid_slam/trajectory_filler.py

在第90行的for循環需要修改:

        # for (tstamp, image, intrinsic) in image_stream:for (tstamp, image, pose, intrinsic) in image_stream:tstamps.append(tstamp)images.append(image)intrinsics.append(intrinsic)if len(tstamps) == 16:pose_list += self.__fill(tstamps, images, intrinsics)tstamps, images, intrinsics = [], [], []

因為在image_stream返回了四個值的,這樣才對:for (tstamp, image, pose, intrinsic) in image_stream

運行姿勢估計,執行下面命令:

python dovsg/scripts/pose_estimation.py \--datadir "data_example/room1" \--calib "data_example/room1/calib.txt" \--t0 0 \--stride 1 \--weights "checkpoints/droid-slam/droid.pth" \--buffer 2048

程序運行結束后,我們將看到一個名為 的新文件夾poses_droidslamdata_example/room1其中包含所有視點的姿勢。

運行信息:

Pose Estimation:: 100%|██████████████████████████████████████████████████████████| 739/739 [00:25<00:00, 29.32it/s]
################################
Global BA Iteration #1
Global BA Iteration #2
Global BA Iteration #3
Global BA Iteration #4
Global BA Iteration #5
Global BA Iteration #6
Global BA Iteration #7
################################
Global BA Iteration #1
Global BA Iteration #2
Global BA Iteration #3
Global BA Iteration #4
Global BA Iteration #5
Global BA Iteration #6
Global BA Iteration #7
Global BA Iteration #8
Global BA Iteration #9
Global BA Iteration #10
Global BA Iteration #11
Global BA Iteration #12
Result Pose Number is 739
?

14、可視化重建的場景

根據DROID-SLAM估計的姿勢,可視化重建場景

激活 Conda 環境,執行下面命令:

conda deactivate 
conda activate dovsg

重建3D場景,執行下面命令:

python dovsg/scripts/show_pointcloud.py \--tags "room1" \--pose_tags "poses_droidslam"

可視化效果:

15、進行DOV-SG推理

執行下面命令:

python demo.py \--tags "room1" \--preprocess \--debug \--task_scene_change_level "Minor Adjustment" \--task_description "Please move the red pepper to the plate, then move the green pepper to plate."

該代碼的思路流程:

  1. 使用相機對房間進行掃描,收集 RGB-D 數據。
  2. 基于收集到的 RGB-D 數據估計相機姿態。
  3. 根據檢測到的地面(floor)進行坐標系變換。
  4. 訓練重定位模型(ACE),為后續操作提供支持。
  5. 生成視圖數據集(View Dataset)。
  6. 利用視覺語言模型(VLMs)對現實世界中的對象進行表示,使其如同 3D 場景圖(3D Scene Graph)中的節點(nodes)一般;同時采用基于規則(rule-based)的方法提取對象間的關系(relationships)。
  7. 提取 LightGlue 特征,以輔助后續的重定位任務。
  8. 將其應用于大語言模型(LLM)的任務規劃中。
  9. 在執行重定位(relocalization)子任務時,對 3D 場景圖進行持續更新(continuously updating)。

根據檢測到的地面,進行坐標系變換:

get floor pcd and transform scene.: 100%|████████████████████████████████████████| 247/247 [00:41<00:00, ?5.93it/s]

訓練重定位模型(ACE),為后續操作提供支持:
Train ACE
create save folder: data_example/room1/ace
filling training buffers with 1000000/8000000 samples
filling training buffers with 2000000/8000000 samples
filling training buffers with 3000000/8000000 samples
filling training buffers with 4000000/8000000 samples
filling training buffers with 5000000/8000000 samples
filling training buffers with 6000000/8000000 samples
filling training buffers with 7000000/8000000 samples
filling training buffers with 8000000/8000000 samples
Train ACE Over!

運行打印信息:

final text_encoder_type: bert-base-uncased
==> Initializing CLIP model...
==> Done initializing CLIP model.
BertLMHeadModel has generative capabilities, as `prepare_inputs_for_generation` is explicitly defined. However, it doesn't directly inherit from `GenerationMixin`. From 👉v4.50👈 onwards, `PreTrainedModel` will NOT inherit from `GenerationMixin`, and this model will lose the ability to call `generate` and other related functions.
? - If you're using `trust_remote_code=True`, you can get rid of this warning by loading the model with an auto class. See https://huggingface.co/docs/transformers/en/model_doc/auto#auto-classes
? - If you are the owner of the model architecture code, please modify your model class such that it inherits from `GenerationMixin` (after `PreTrainedModel`, otherwise you'll get an exception).
? - If you are not the owner of the model architecture class, please contact the model code owner to update it.
The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
/encoder/layer/0/crossattention/self/query is tied
/encoder/layer/0/crossattention/self/key is tied
/encoder/layer/0/crossattention/self/value is tied
/encoder/layer/0/crossattention/output/dense is tied
/encoder/layer/0/crossattention/output/LayerNorm is tied
/encoder/layer/0/intermediate/dense is tied
/encoder/layer/0/output/dense is tied
/encoder/layer/0/output/LayerNorm is tied
/encoder/layer/1/crossattention/self/query is tied
/encoder/layer/1/crossattention/self/key is tied
/encoder/layer/1/crossattention/self/value is tied
/encoder/layer/1/crossattention/output/dense is tied
/encoder/layer/1/crossattention/output/LayerNorm is tied
/encoder/layer/1/intermediate/dense is tied
/encoder/layer/1/output/dense is tied
/encoder/layer/1/output/LayerNorm is tied
--------------
checkpoints/recognize_anything/ram_swin_large_14m.pth
--------------
load checkpoint from checkpoints/recognize_anything/ram_swin_large_14m.pth
vit: swin_l
semantic meomry: 100%|███████████████████████████████████████████████████████████| 247/247 [04:15<00:00, ?1.03s/it]
.........

檢測出的物體:

LLM大語言模型任務規劃過程,打印信息:

[{'action': 'Go to', 'object1': 'red pepper', 'object2': None}, {'action': 'Pick up', 'object1': 'red pepper'}, {'action': 'Go to', 'object1': 'plate', 'object2': None}, {'action': 'Place', 'object1': 'red pepper', 'object2': 'plate'}, {'action': 'Go to', 'object1': 'green pepper', 'object2': None}, {'action': 'Pick up', 'object1': 'green pepper'}, {'action': 'Go to', 'object1': 'plate', 'object2': None}, {'action': 'Place', 'object1': 'green pepper', 'object2': 'plate'}]
Initializing Instance Localizer.


?Data process over!?


===> get observations from robot.
observation save path: data_example/room1/memory/3_0.1_0.01_True_0.2_0.5/Minor Adjustment long_term_task: Please move the red pepper to the plate, then move the green pepper to plate./step_0/observations/0_start.npy
Sampling 64 hypotheses.

通過ICP匹配,執行重定位子任務,對 3D 場景圖進行持續更新

IPC Number: 5182, 7589, 6760
IPC Number: 20009, 35374, 27853
IPC Number: 80797, 179609, 129217

導航過程:(綠色點是當前位置 ,紅色點目標位置,紫紅色是導航軌跡)

Now are in step 0


Runing Go to(red pepper, None) Task.
A is ?red pepper
B is ?None
====> A* planning.
[[2.33353067 0.83389901 3.92763996]
?[2.05 ? ? ? 0.55 ? ? ? 4.19324287]
?[1.85 ? ? ? 0.2 ? ? ? ?5.09701148]]

機器人找到物體,進行操作(請將紅辣椒移到盤子里,然后將青椒移到盤子里):

data_example/room1/memory/3_0.1_0.01_True_0.2_0.5/Minor Adjustment long_term_task: Please move the red pepper to the plate, then move the green pepper to plate./step_0/navigation_vis.jpg
please move the agent to target point (Press Enter).===> get observations from robot.
observation save path: data_example/room1/memory/3_0.1_0.01_True_0.2_0.5/Minor Adjustment long_term_task: Please move the red pepper to the plate, then move the green pepper to plate./step_0/observations/1_after_Go to(red pepper, None).npy

分享完成~

本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。
如若轉載,請注明出處:http://www.pswp.cn/news/913014.shtml
繁體地址,請注明出處:http://hk.pswp.cn/news/913014.shtml
英文地址,請注明出處:http://en.pswp.cn/news/913014.shtml

如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!

相關文章

mongodb 中dbs 時,local代表的是什么

在 MongoDB 中&#xff0c;local 是一個內置的系統數據庫&#xff0c;用于存儲當前 MongoDB 實例&#xff08;或副本集節點&#xff09;的元數據和內部數據&#xff0c;與其他數據庫不同&#xff0c;local 數據庫的數據不會被復制到副本集的其他成員。 local 數據庫的核心作用 …

Spring Cloud(微服務部署與監控)

&#x1f4cc; 摘要 在微服務架構中&#xff0c;隨著服務數量的增長和部署復雜度的提升&#xff0c;如何高效部署、持續監控、快速定位問題并實現自動化運維成為保障系統穩定性的關鍵。 本文將圍繞 Spring Cloud 微服務的部署與監控 展開&#xff0c;深入講解&#xff1a; 微…

音頻動態壓縮算法曲線實現

Juce實現動態壓縮曲線繪制 動態范圍壓縮算法&#xff08;Dynamic Range Compression&#xff0c;DRC&#xff09;是將音頻信號的動態范圍映射到一個較小的范圍內的過程&#xff0c;即降低較高的峰值的信號電平&#xff0c;而不處理較安靜的部分。DRC被廣泛用于音頻錄制、制作工…

技術視界 | OpenLoong 控制框架:打造通用人形機器人智能系統的中樞基座

在人形機器人向通用性、智能化方向加速演進的當下&#xff0c;控制系統的角色正在發生根本變化&#xff1a;它不再只是底層驅動的接口適配層&#xff0c;也不只是策略調用的轉譯引擎&#xff0c;而是成為連接具身模型、異構本體與多樣化任務的“中樞神經系統”。 在 2025 年張…

IOS 藍牙連接

最近做一個硬件設備&#xff0c;寫IOS相應的數據連接/分析代碼時&#xff1b;發現一個問題&#xff0c;如果是開機&#xff0c;每次都能連接上。連接斷開后&#xff0c;發現再也掃描不到了。通過第三方工具LightBlue&#xff0c;發現信號是-127。 此時進入設置查看藍牙設備&am…

【硬核數學 · LLM篇】3.1 Transformer之心:自注意力機制的線性代數解構《從零構建機器學習、深度學習到LLM的數學認知》

我們已經完成了對機器學習和深度學習核心數學理論的全面探索。我們從第一階段的經典機器學習理論&#xff0c;走到了第二階段的深度學習“黑盒”內部&#xff0c;用線性代數、微積分、概率論、優化理論等一系列數學工具&#xff0c;將神經網絡的每一個部件都拆解得淋漓盡致。 …

flutter封裝vlcplayer的控制器

import dart:async;import package:flutter_vlc_player/flutter_vlc_player.dart; import package:flutter/material.dart;class GlobalVlcController extends ChangeNotifier {//設置單例/*static final GlobalVlcController _instance GlobalVlcController._internal();fact…

SEO-濫用元機器人、規范或 hreflang 標簽

&#x1f9f1; 一、濫用 Meta Robots 標簽 ? 常見問題&#xff1a; 問題描述設置了 noindex 不該屏蔽的頁面比如產品頁、分類頁被意外 noindex&#xff0c;導致不被收錄設置 nofollow 導致內鏈失效所有鏈接都被 nofollow&#xff0c;影響爬蟲抓取路徑在 <meta> 標簽和…

笨方法學python -練習14

程序&#xff1a; from sys import argv script, user_name argv prompt > print(f"Hi {user_name}, Im the {script} script.") print("Id like to ask you a few questions.") print(f"Do you like me {user_name}?") likes in…

Frida:配置自動補全 in VSCode

1. 前言 編寫 frida JavaScript 腳本是一件 very 普遍的事情在 Android Reverse 中。為了方便編寫&#xff0c;配置相關的環境使其能夠自動補全是很關鍵的&#xff0c;即通過類名就能夠獲取該類的所有對外接口信息&#xff0c;這是面向對象編程的核心優勢&#xff0c;可惜我沒…

FPGA矩陣算法實現

簡介 現如今設計上對速度的要求越來越高&#xff0c;而矩陣相乘含有大量的乘法和加法計算&#xff0c;造成計算時間長從而影響性能&#xff0c;本章節利用FPGA實現浮點型矩陣運算&#xff0c;可在極短時間內完成矩陣運算。 知識介紹 矩陣計算公式如下&#xff1a; 需要保證A的…

C#可空類型詳解:從基礎到高級應用

C#可空類型詳解&#xff1a;從基礎到高級應用 在C#編程中&#xff0c;可空類型是一個非常重要的概念&#xff0c;它允許我們為值類型&#xff08;如int、bool、DateTime等&#xff09;分配null值&#xff0c;從而增強了代碼的表達能力和靈活性。本文將詳細介紹C#中可空類型的各…

Elasticsearch:異常檢測入門

在我之前的文章里&#xff0c;我有講述很多有關使用機器學習來針對數據做異常監測的文章。你可以在 “開發者上手指南” 里的 “機器學習” 章節中找到。在今天的練習中&#xff0c;我將使用最新的 Elastic Stack 9.0.2 來展示如何在 Elasticsearch 中使用機器學習的方法來進行…

ARuler3.1.3 | 高級版測量應用,利用AR技術測量所有

ARuler是一款非常便捷的測量應用程序&#xff0c;專為需要精確測量的用戶設計。它不僅具備強大的3D測量功能&#xff0c;還利用增強現實&#xff08;AR&#xff09;技術&#xff0c;為用戶提供多種測量選項&#xff0c;包括角度、長度、寬度、高度、面積和體積等。無論是日常生…

MapReduce分布式計算框架:從原理到實戰

大家好&#xff01;今天我們來聊聊大數據處理領域的一個重要框架——MapReduce。作為Google提出的經典分布式計算模型&#xff0c;MapReduce極大地簡化了海量數據的處理流程。無論你是大數據新手還是有一定經驗的開發者&#xff0c;這篇文章都會讓你對MapReduce有更深入的理解。…

Redis 7 及更高版本的腳本化方案

一、背景與動機 傳統的 Redis 腳本機制依賴于客戶端加載 EVAL 腳本&#xff0c;存在以下局限&#xff1a; 網絡與編譯開銷 每次調用都要傳輸腳本源碼或重新加載 SHA1。緩存失效風險 重啟、主從切換、SCRIPT FLUSH 后腳本緩存丟失&#xff0c;事務易失敗。調試與運維困難 SHA1…

Java項目:基于SSM框架實現的云端學習管理系統【ssm+B/S架構+源碼+數據庫+畢業論文】

摘 要 互聯網發展至今&#xff0c;無論是其理論還是技術都已經成熟&#xff0c;而且它廣泛參與在社會中的方方面面。它讓信息都可以通過網絡傳播&#xff0c;搭配信息管理工具可以很好地為人們提供服務。針對課程學習信息管理混亂&#xff0c;出錯率高&#xff0c;信息安全性差…

【壓力測試之_Jmeter鏈接Oracle數據庫鏈接】

Oracle數據庫鏈接 歡迎來到挖坑避坑課堂鏈接數據庫 歡迎來到挖坑避坑課堂 之前性能測試都是業務之類的&#xff0c;數據庫壓測很少涉及&#xff0c;就會出現很多各式各樣的問題&#xff0c;首要問題就是Jmeter鏈接數據庫的問題&#xff0c;本篇主要講解Jmeter鏈接Oracle數據庫…

Appium與Appium Inspector配置教程

一、連接設備 首先將手機的開發者模式打開&#xff0c;不同手機的開啟方法不同&#xff0c;這里演示的測試機為vivoS1&#xff0c;其他機型的開啟方法大家可以自行AI搜索。 1.手機授權 &#xff08;1&#xff09;點擊手機的【設置】選項 &#xff08;2&#xff09;打開手機…

【web出海】深度拆解 FLUX.1 kontext:這不僅是AI繪畫的革命,更是 MicroSaaS 創業者的黃金機遇

前言 近日&#xff0c;Black Forest Labs 發布的 FLUX.1 Kontext 模型在AI圈掀起了波瀾。它不僅僅是又一個文生圖工具&#xff0c;其獨特的“在情境中&#xff08;in-context&#xff09;”編輯、驚人的角色一致性、精準的局部修改和強大的文字渲染能力&#xff0c;標志著一個技…