[完美解決]Accelerate設置單卡訓練報錯，成功設置單卡訓練

報錯內容

ValueError: Less than two GPU ids were configured and tried to run on on multiple GPUs. Please ensure at least two are specified for `--gpu_ids`, or use `--gpu_ids='all'`.

ValueError:配置了少于兩個GPU id，并試圖在多個GPU上運行。請確保為'——gpu_ids '指定至少兩個，或者使用'——gpu_ids='all' '。

當前設置
(報錯Less than two GPU ids were configured and tried to run on on multiple GPUs)

更改設置

運行如下，可以看出雖然 0，1 內存為主，但是其他卡內存為3m也被占用

然后設置2,3,跑另外一個程序

報錯顯示端口被占用
ConnectionError: Tried to launch distributed communication on port `9999`, but another process is utilizing it. Please specify a different port (such as using the `----main_process_port` flag or specifying a different `main_process_port` in your config file) and rerun your script. To automatically use the next open port (on a single node), you can set this to `0`.

解決辦法

從報錯可以看出，

ConnectionError: Tried to launch distributed communication on port `9999`, but another process is utilizing it. Please specify a different port (such as using the `----main_process_port` flag or specifying a different `main_process_port` in your config file) and rerun your script. To automatically use the next open port (on a single node), you can set this to `0`.

我們設置端口是9999，只要再跑程序還是這個端口，這是因為我們的代碼里設置了端口9999

更改設置為，端口8888即可（肯定可以），或者按照提示設置為0(未得到驗證)

accelerate launch --mixed_precision="fp16"  --main_process_port=8888 ../kd_yw.py

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/news/212102.shtml
繁體地址，請注明出處：http://hk.pswp.cn/news/212102.shtml
英文地址，請注明出處：http://en.pswp.cn/news/212102.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！