TensorFlow炼丹（1） Using GPUs

2023年3月12日 266次阅读来源: 张天亮

1. Supported devices

TensorFlow支持CPU和GPU两种模式：

“/cpu:0″：你电脑上的CPU
“/gpu:0″：你电脑上的GPU
“/gpu:1″：你电脑上的第二块GPU

如果一个TensorFLow的操作同时包含CPU和GPU的实现，当这个操作被分配给设备的时候，GPU设备将优先被分配。例如，“matmul”这个操作，当设备有cpu:0和gpu:0时，gpu:0将会被选择去执行“matmul”。

2. Logging Device placement

如果要找出你的操作和tensors被分配给了哪些设备，请使用log_device_placement设置为True配置session。

# Creates a graph.
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print(sess.run(c))

输出：

Device mapping:
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: Tesla K40c, pci bus
id: 0000:05:00.0
b: /job:localhost/replica:0/task:0/gpu:0
a: /job:localhost/replica:0/task:0/gpu:0
MatMul: /job:localhost/replica:0/task:0/gpu:0
[[ 22.  28.]
 [ 49.  64.]]

3. Manual device placement

如果你希望特定的操作在你选择的设备上运行，而不是自动选择的设备，则可以使用tf.device去创建设备内容，以使得该设备内容中的所有操作具有相同的设备分配。

# Creates a graph.
with tf.device('/cpu:0'):
  a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
  b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print(sess.run(c))

a和b被分配给了cpu:0。由于没有指定设备执行Matmul操作，TensorFlow运行的时候自动选择gpu:0执行Matmul。

Device mapping:
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: Tesla K40c, pci bus
id: 0000:05:00.0
b: /job:localhost/replica:0/task:0/cpu:0
a: /job:localhost/replica:0/task:0/cpu:0
MatMul: /job:localhost/replica:0/task:0/gpu:0
[[ 22.  28.]
 [ 49.  64.]]

4. Allowing GPU memory growth

默认情况下，TensorFlow会占用所有GPUs的所有GPU内存（取决于CUDA_VISIBLE_DEVICES这个系统变量）。这样做可以减少内存碎片来更有效地利用设备上相对宝贵的GPU内存资源。

在某些情况下，该进程仅仅需要分配可用内存的一部分，或者根据该进程的需要来增加内存的使用量。TensorFlow在Session上提供了两个Config选项来进行控制。

第一个是“allow_growth”选项，它仅仅基于运行时的分配来分配更多的GPU内存：它开始分配非常少的内存，并且随着Session的运行和更多的GPU内存需求，扩展TensorFlow所需的GPU内存区域。者可能导致很糟糕的内存碎片。要打开此选项，请在ConfigProto中将设置为：

config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config, ...)

第二个方法是“pre_process_gpu_memory_fraction选项”，它决定了每个可见的GPU应分配的内存总量的百分比。例如，您可以告诉TensorFlow仅仅分配总内存的40%，通过一下设定就可以实现：

config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.4
session = tf.Session(config=config, ...)

如果你想限制TensorFlow进程可利用的GPU内存数量，以上对你非常的有用。

5. Using a single GPU on a multi-GPU system

如果在你的电脑中有超过一块GPU，默认会选择最低ID的GPU。如果你想运行在其他的GPU上，你可以明确指定一个选项：

# Creates a graph.
with tf.device('/gpu:2'):
  a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
  b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
  c = tf.matmul(a, b)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print(sess.run(c))

如果你指定的设备不存在，你会得到“InvalidArgumentError”：

InvalidArgumentError: Invalid argument: Cannot assign a device to node 'b':
Could not satisfy explicit device specification '/gpu:2'
   [[Node: b = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [3,2]
   values: 1 2 3...>, _device="/gpu:2"]()]]

如果你希望TensorFlow自动选择一个存在的并且可以支持的设备运行操作，以防指定的设备不存在，则可以在创建session时在配置选项中将“allow_soft_placement”设置为True。

# Creates a graph.
with tf.device('/gpu:2'):
  a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
  b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
  c = tf.matmul(a, b)
# Creates a session with allow_soft_placement and log_device_placement set
# to True.
sess = tf.Session(config=tf.ConfigProto(
      allow_soft_placement=True, log_device_placement=True))
# Runs the op.
print(sess.run(c))

6. Using multiple GPUs

如果你想在多块GPUs上运行TensorFlow，你可以以multi-tower模式构建你的模型，这里每个模式被分配给不同的GPU，例如：

# Creates a graph.
c = []
for d in ['/gpu:2', '/gpu:3']:
  with tf.device(d):
    a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3])
    b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2])
    c.append(tf.matmul(a, b))
with tf.device('/cpu:0'):
  sum = tf.add_n(c)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print(sess.run(sum))

输出：

Device mapping:
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: Tesla K20m, pci bus
id: 0000:02:00.0
/job:localhost/replica:0/task:0/gpu:1 -> device: 1, name: Tesla K20m, pci bus
id: 0000:03:00.0
/job:localhost/replica:0/task:0/gpu:2 -> device: 2, name: Tesla K20m, pci bus
id: 0000:83:00.0
/job:localhost/replica:0/task:0/gpu:3 -> device: 3, name: Tesla K20m, pci bus
id: 0000:84:00.0
Const_3: /job:localhost/replica:0/task:0/gpu:3
Const_2: /job:localhost/replica:0/task:0/gpu:3
MatMul_1: /job:localhost/replica:0/task:0/gpu:3
Const_1: /job:localhost/replica:0/task:0/gpu:2
Const: /job:localhost/replica:0/task:0/gpu:2
MatMul: /job:localhost/replica:0/task:0/gpu:2
AddN: /job:localhost/replica:0/task:0/cpu:0
[[  44.   56.]
 [  98.  128.]]

    原文作者：张天亮
    原文地址: https://zhuanlan.zhihu.com/p/28083241
    本文转自网络文章，转载此文章仅为分享知识，如有侵权，请联系博主进行删除。