pytorch GPU cuda 使用報錯整理

GPU 使用、報錯整理

1. 使用指定GPU（單卡）
- 1.1 方法1：os.environ['CUDA_VISIBLE_DEVICES']
- 1.2 方法2：torch.device('cuda:2')
- 1.3 報錯1：RuntimeError: CUDA error: invalid device ordinal CUDA kernel errors might be asynchronously reported
- 1.4 torch.load報錯：RuntimeError: CUDA out of memory...
2. 使用指定GPU（多卡 DataParallel）
- - 2.1 正常DP使用（未測試）
  - 2.2 pyg中DP的使用

1. 使用指定GPU（單卡）

1.1 方法1：os.environ[‘CUDA_VISIBLE_DEVICES’]

os.environ['CUDA_VISIBLE_DEVICES'] = '2'
model = Net().to('cuda')
data = data.to('cuda')

1.2 方法2：torch.device(‘cuda:2’)

device = torch.device('cuda:2')
model = Net().to(device)
data = data .to(device)

1.3 報錯1：RuntimeError: CUDA error: invalid device ordinal CUDA kernel errors might be asynchronously reported

解決方法：方法1和方法2不要混著使用

1.4 torch.load報錯：RuntimeError: CUDA out of memory…

解決方法：
1）顯存不夠只能換卡
2）即使使用方法2指定設備，torch.load也默認在0卡（保存卡）上加載。所以，可以使用方法1指定gpu，或torch.load(path, map_location=lambda storage, loc: storage.cuda(2))

2. 使用指定GPU（多卡 DataParallel）

2.1 正常DP使用（未測試）

DP的使用很簡單，使用DataParallel將模型包裹住即可，訓練代碼與單卡一致。
代碼：

import torch
from torch.nn import DataParallel
model = Model()
model = DataParallel(model, device_ids = [0, 1])

2.2 pyg中DP的使用

注意：使用torch geometric（pyg）要用torch_geometric.nn里的DataListLoader！！！
代碼：

device_ids = [0, 2, 3]
# 需指定主卡，默認是0卡，不指定device會報錯：RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:2 and cuda:0!
device = torch.device(f'cuda:{device_ids[0]}')
model = Net() # 輸入特征維度，隱藏特征維度，輸出特征維度
model = DataParallel(model, device_ids = device_ids)
model.to(device)

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/web/44661.shtml
繁體地址，請注明出處：http://hk.pswp.cn/web/44661.shtml
英文地址，請注明出處：http://en.pswp.cn/web/44661.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！