0.問題說明
想要讓RAGFLOW利用GPU資源跑起來,可以選擇docker-compose-gpu.yml啟動。(但是官網啟動案例是×86平臺的不是NVIDIA GPU的,docker-compose-gpu.yml又是第三方維護,所以稍有問題)
1.問題
docker利用docker-compose-gpu.yml啟動RAGFLOW,文檔解析出錯
報錯:
18:10:23 [ERROR][Exception]: NCCL Error 2: unhandled system error (run with NCCL_DEBUG=INFO for details)
2.解決方案
(1)修改docker-compose-gpu.yml文件(稍作改動)
下面是修改后的完整docker-compose-gpu.yml文件,可以直接復制。
# The RAGFlow team do not actively maintain docker-compose-gpu.yml, so use them at your own risk.
# However, you are welcome to file a pull request to improve it.
include:- ./docker-compose-base.ymlservices:ragflow:depends_on:mysql:condition: service_healthyimage: ${RAGFLOW_IMAGE}container_name: ragflow-serverports:- ${SVR_HTTP_PORT}:9380- 80:80- 443:443volumes:- ./ragflow-logs:/ragflow/logs- ./nginx/ragflow.conf:/etc/nginx/conf.d/ragflow.conf- ./nginx/proxy.conf:/etc/nginx/proxy.conf- ./nginx/nginx.conf:/etc/nginx/nginx.confenv_file: .envipc: hostshm_size: 8genvironment:- TZ=${TIMEZONE}- HF_ENDPOINT=${HF_ENDPOINT}- MACOS=${MACOS}- NCCL_DEBUG=INFOnetworks:- ragflowrestart: on-failure# https://docs.docker.com/engine/daemon/prometheus/#create-a-prometheus-configuration# If you're using Docker Desktop, the --add-host flag is optional. This flag makes sure that the host's internal IP gets exposed to the Prometheus container.extra_hosts:- "host.docker.internal:host-gateway"deploy:resources:reservations:devices:- driver: nvidiacount: allcapabilities: [gpu]
參數解釋:
ipc: host:允許容器共享主機的IPC命名空間,解決NCCL多卡通信問題
shm_size: 8g:增大共享內存容量(默認64MB不足)
(2)通過docker-compose-gpu.yml重新啟動RAGFLOW
docker compose -f docker-compose-gpu.yml up -d
(3)運行ragflow-server服務器
docker logs -f ragflow-server
(4)檢查是否成功進行文檔解析
成功解析如下結果:
到此,問題解決!