多GPU使用詳解

目錄：

介紹

記錄設備狀態

手動分配狀態

允許GPU內存增長

在多GPU系統是使用單個GPU

使用多個 GPU

一、介紹

在一個典型的系統中，有多個計算設備。在 TensorFlow 中支持的設備類型包括 CPU 和 GPU。他們用字符串來表達，例如：

“/cpu:0”: 機器的 CPU
“/device:GPU:0”: 機器的 GPU 如果你只有一個
“/device:GPU:1”: 機器的第二個 GPU

如果 TensorFlow 操作同時有 CPU 和 GPU 的實現，操作將會優先分配給 GPU 設備。例如，matmul 同時有 CPU 和 GPU 核心，在一個系統中同時有設備 cpu:0 和 gpu:0，gpu:0 將會被選擇來執行 matmul。

二、記錄設備狀態

為了確定你的操作和張量分配給了哪一個設備，創建一個把 log_device_placement 的配置選項設置為 True 的會話即可。

創建一個計算圖

a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name=’a’)

b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name=’b’)

c = tf.matmul(a, b)

創建一個 session，它的 log_device_placement 被設置為 True.

sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))

運行這個操作

print(sess.run(c))

你將會看到一下輸出:

Device mapping:

/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Tesla K40c, pci bus

id: 0000:05:00.0

b: /job:localhost/replica:0/task:0/device:GPU:0

a: /job:localhost/replica:0/task:0/device:GPU:0

MatMul: /job:localhost/replica:0/task:0/device:GPU:0

[[ 22. 28.]

[ 49. 64.]]

三、手動分配設備

如果你希望一個特定的操作運行在一個你選擇的設備上，而不是自動選擇的設備，你可以使用 tf.device 來創建一個設備環境，這樣所有在這個環境的操作會有相同的設備分配選項。

創建一個會話

with tf.device(‘/cpu:0’):

a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name=’a’)

b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name=’b’)

c = tf.matmul(a, b)

創建一個 session，它的 log_device_placement 被設置為 True

sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))

運行這個操作

print(sess.run(c))

你將會看到 a 和 b 被分配給了 cpu:0。因為沒有指定特定的設備來執行 matmul 操作，TensorFlow 將會根據操作和已有的設備來選擇(在這個例子中是 gpu:0)，并且如果有需要會自動在設備之間復制張量。

Device mapping:

/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Tesla K40c, pci bus

id: 0000:05:00.0

b: /job:localhost/replica:0/task:0/cpu:0

a: /job:localhost/replica:0/task:0/cpu:0

MatMul: /job:localhost/replica:0/task:0/device:GPU:0

[[ 22. 28.]

[ 49. 64.]]

四、允許 GPU 內存增長

默認情況下，TensorFlow 將幾乎所有的 GPU的顯存（受 CUDA_VISIBLE_DEVICES 影響）映射到進程。通過減少內存碎片，可以更有效地使用設備上寶貴的GPU內存資源。

在某些情況下，只需要分配可用內存的一個子集給進程，或者僅根據進程需要增加內存使用量。 TensorFlow 在 Session 上提供了兩個 Config 選項來控制這個選項。

第一個是 allow_growth 選項，它根據運行時的需要分配 GPU 內存：它開始分配很少的內存，并且隨著 Sessions 運行并需要更多的 GPU 內存，我們根據 TensorFlow 進程需要繼續擴展了GPU所需的內存區域。請注意，我們不釋放內存，因為這會導致內存碎片變得更糟。要打開此選項，請通過以下方式在 ConfigProto 中設置選項：

config = tf.ConfigProto()

config.gpu_options.allow_growth = True

session = tf.Session(config=config, …)

第二種方法是 per_process_gpu_memory_fraction 選項，它決定了每個可見GPU應該分配的總內存量的一部分。例如，可以通過以下方式告訴 TensorFlow 僅分配每個GPU的總內存的40％：

config = tf.ConfigProto()

config.gpu_options.per_process_gpu_memory_fraction = 0.4

session = tf.Session(config=config, …)

如果要真正限制 TensorFlow 進程可用的GPU內存量，這非常有用。

五、在多GPU系統上使用單個GPU

如果您的系統中有多個GPU，則默認情況下將選擇具有最低ID的GPU。如果您想在不同的GPU上運行，則需要明確指定首選項：

創建一個計算圖

with tf.device(‘/device:GPU:2’):

a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name=’a’)

b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name=’b’)

c = tf.matmul(a, b)

創建一個 log_device_placement 設置為True 的會話

sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))

運行這個操作

print(sess.run(c))

你會看到現在 a 和 b 被分配給 cpu:0。由于未明確指定設備用于 MatMul 操作，因此 TensorFlow 運行時將根據操作和可用設備（本例中為 gpu:0）選擇一個設備，并根據需要自動復制設備之間的張量。

如果指定的設備不存在，將得到 InvalidArgumentError：

InvalidArgumentError: Invalid argument: Cannot assign a device to node ‘b’:

Could not satisfy explicit device specification ‘/device:GPU:2’

[[Node: b = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [3,2]

values: 1 2 3…>, _device=”/device:GPU:2”]()]]

如果希望 TensorFlow 在指定的設備不存在的情況下自動選擇現有的受支持設備來運行操作，則可以在創建會話時在配置選項中將 allow_soft_placement 設置為 True。

創建計算圖

with tf.device(‘/device:GPU:2’):

a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name=’a’)

b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name=’b’)

c = tf.matmul(a, b)

創建一個 allow_soft_placement 和 log_device_placement 設置為 True 的會話

sess = tf.Session(config=tf.ConfigProto(

allow_soft_placement=True, log_device_placement=True))

運行這個操作

print(sess.run(c))

六、使用多個 GPU

如果您想要在多個 GPU 上運行 TensorFlow ，則可以采用多塔式方式構建模型，其中每個塔都分配有不同的 GPU。例如：

創建計算圖

c = []

for d in [‘/device:GPU:2’, ‘/device:GPU:3’]:

with tf.device(d):

a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3])

b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2])

c.append(tf.matmul(a, b))

with tf.device(‘/cpu:0’):

sum = tf.add_n(c)

創建一個 log_device_placement 設置為 True 的會話

sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))

運行這個操作

print(sess.run(sum))

你將會看到以下的輸出：

Device mapping:

/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Tesla K20m, pci bus

id: 0000:02:00.0

/job:localhost/replica:0/task:0/device:GPU:1 -> device: 1, name: Tesla K20m, pci bus

id: 0000:03:00.0

/job:localhost/replica:0/task:0/device:GPU:2 -> device: 2, name: Tesla K20m, pci bus

id: 0000:83:00.0

/job:localhost/replica:0/task:0/device:GPU:3 -> device: 3, name: Tesla K20m, pci bus

id: 0000:84:00.0

Const_3: /job:localhost/replica:0/task:0/device:GPU:3

Const_2: /job:localhost/replica:0/task:0/device:GPU:3

MatMul_1: /job:localhost/replica:0/task:0/device:GPU:3

Const_1: /job:localhost/replica:0/task:0/device:GPU:2

Const: /job:localhost/replica:0/task:0/device:GPU:2

MatMul: /job:localhost/replica:0/task:0/device:GPU:2

AddN: /job:localhost/replica:0/task:0/cpu:0

[[ 44. 56.]

[ 98. 128.]]

翻譯自：

https://www.tensorflow.org/programmers_guide/using_gpu

多GPU使用詳解

創建一個計算圖

創建一個 session，它的 log_device_placement 被設置為 True.

運行這個操作

創建一個會話

創建一個 session，它的 log_device_placement 被設置為 True

運行這個操作

創建一個計算圖

創建一個 log_device_placement 設置為True 的會話

運行這個操作

創建計算圖

創建一個 allow_soft_placement 和 log_device_placement 設置為 True 的會話

運行這個操作

創建計算圖

創建一個 log_device_placement 設置為 True 的會話

運行這個操作

相關文章

圖片上傳的兩種方式

最好理解的： spring ioc原理講解，強烈推薦！

微信小程序 - 回到自己位置(map)

uwsgi搭配nginx

如何讓tomcat服務器運行在80端口,并且無需輸入項目名即可訪問項目()

tailf、tail -f、tail -F三者區別

使用圖形芯片加速電子自動化設計應用程序

自我介紹的四個套路

加載樣式js

flush privileges

【Linux】Linux中常用操作命令

Mybatis的緩存機制Cache

大數據應用時代來襲 SaaS走向沒落？

為什么使用數據庫從庫

Java程序員必知的10個調試技巧

【GPS】GPS的C_GNSS_RF_ELNA_GPIO_NUM_DEFAULT配置，Linux系統中GPIO的設置

學習的境界

性能測試的重要意義（一）

ContextLoaderListener介紹

PLSQL安裝教程，無需oracle客戶端（解決本地需要安裝oracle客戶端的煩惱）