報錯內容
ValueError: Less than two GPU ids were configured and tried to run on on multiple GPUs. Please ensure at least two are specified for `--gpu_ids`, or use `--gpu_ids='all'`.
ValueError:配置了少于兩個GPU id,并試圖在多個GPU上運行。請確保為'——gpu_ids '指定至少兩個,或者使用'——gpu_ids='all' '。
當前設置
(報錯Less than two GPU ids were configured and tried to run on on multiple GPUs)
更改設置
運行如下,可以看出雖然 0,1 內存為主,但是其他卡內存為3m也被占用
然后設置2,3,跑另外一個程序
報錯顯示端口被占用
ConnectionError: Tried to launch distributed communication on port `9999`, but another process is utilizing it. Please specify a different port (such as using the `----main_process_port` flag or specifying a different `main_process_port` in your config file) and rerun your script. To automatically use the next open port (on a single node), you can set this to `0`.
解決辦法
從報錯可以看出,
ConnectionError: Tried to launch distributed communication on port `9999`, but another process is utilizing it. Please specify a different port (such as using the `----main_process_port` flag or specifying a different `main_process_port` in your config file) and rerun your script. To automatically use the next open port (on a single node), you can set this to `0`.
我們設置端口是9999,只要再跑程序還是這個端口,這是因為我們的代碼里設置了端口9999
更改設置為,端口8888即可(肯定可以),或者按照提示設置為0(未得到驗證)
accelerate launch --mixed_precision="fp16" --main_process_port=8888 ../kd_yw.py