使用 GPUs

1. TensorFlow指定特定GPU或者CPU进行计算:

说明:示例计算机为单CPU(编号为0),单GPU(编号为0),安装的TensorFlow为GPU版。

本文的结构如下:

  • 默认为GPU #0
  • 指定CPU #0
  • 指定GPU #1
  • 指定GPU #0 + CPU #0

1.1 默认为GPU #0

In [1]: import tensorflow as tf
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so.1 locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so locally
In [2]: with tf.Session() as sess: ...: matrix1=tf.constant([[3.,3.]]) ...: matrix2=tf.constant([[2.],[2.]]) ...: product=tf.matmul(matrix1,matrix2) ...: result=sess.run(product) ...: print result ...:
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties: name: GeForce GTX 970 major: 5 minor: 2 memoryClockRate (GHz) 1.266 pciBusID 0000:01:00.0 Total memory: 4.00GiB Free memory: 3.62GiB I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0: Y I tensorflow/core/common_runtime/gpu/gpu_device.cc:838] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 970, pci bus id: 0000:01:00.0) [[ 12.]]

1.2 指定GPU #0

In [1]: import tensorflow as tf
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so.1 locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so locally

 

In [2]: with tf.Session() as sess: ...: with tf.device("/gpu:0"): ...: matrix1=tf.constant([[3.,3.]]) ...: matrix2=tf.constant([[2.],[2.]]) ...: product=tf.matmul(matrix1,matrix2) ...: result=sess.run(product) ...: print result ...:

 

I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties: name: GeForce GTX 970 major: 5 minor: 2 memoryClockRate (GHz) 1.266 pciBusID 0000:01:00.0 Total memory: 4.00GiB Free memory: 3.55GiB I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0: Y I tensorflow/core/common_runtime/gpu/gpu_device.cc:838] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 970, pci bus id: 0000:01:00.0) [[ 12.]]

1.3 指定CPU #0

In [1]: import tensorflow as tf

I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so.1 locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so locally 
In [2]: with tf.Session() as sess: ...: with tf.device("/cpu:0"): ...: matrix1=tf.constant([[3.,3.]]) ...: matrix2=tf.constant([[2.],[2.]]) ...: product=tf.matmul(matrix1,matrix2) ...: result=sess.run(product) ...: print result ...:

 

I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties: name: GeForce GTX 970 major: 5 minor: 2 memoryClockRate (GHz) 1.266 pciBusID 0000:01:00.0 Total memory: 4.00GiB Free memory: 3.62GiB I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0: Y I tensorflow/core/common_runtime/gpu/gpu_device.cc:838] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 970, pci bus id: 0000:01:00.0) [[ 12.]]

 

1.4 指定GPU #1

In [1]: import tensorflow as tf
  • 1
In [2]: with tf.Session() as sess: ...: with tf.device("/gpu:1"): ...: matrix1=tf.constant([[3.,3.]]) ...: matrix2=tf.constant([[2.],[2.]]) ...: product=tf.matmul(matrix1,matrix2) ...: result=sess.run(product) ...: print result ...:
I tensorflow/core/common_runtime/gpu/gpu_device.cc:838] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 970, pci bus id: 0000:01:00.0)
InvalidArgumentError     Traceback (most recent call last)

<ipython-input-4-380488ab0827> in <module>() 4 matrix2=tf.constant([[2.],[2.]]) 5 product=tf.matmul(matrix1,matrix2) ----> 6 result=sess.run(product) 7 print result 8
InvalidArgumentError: **Cannot assign a device to node** 'MatMul_2': Could not satisfy explicit device specification '/device:GPU:1' because no devices matching that specification are registered in this process; available devices: /job:localhost/replica:0/task:0/cpu:0, /job:localhost/replica:0/task:0/gpu:0 [[Node: MatMul_2 = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/device:GPU:1"](Const_4, Const_5)]]

 

说明:因为本机只有一块GPU,编号为0,而我指定该计算在GPU:1中进行,才报错。

1.4 指定GPU #0 + GPU #0

In [1]: import tensorflow as tf
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so.1 locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so locally

 

In [2]: with tf.Session() as sess: ...: with tf.device("/cpu:0"): ...: matrix1=tf.constant([[3.,3.]]) ...: matrix2=tf.constant([[2.],[2.]]) ...: with tf.device("/gpu:0"): ...: product=tf.matmul(matrix1,matrix2) ...: result=sess.run(product) ...: print result ...:
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties: name: GeForce GTX 970 major: 5 minor: 2 memoryClockRate (GHz) 1.266 pciBusID 0000:01:00.0 Total memory: 4.00GiB Free memory: 3.62GiB I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0: Y I tensorflow/core/common_runtime/gpu/gpu_device.cc:838] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 970, pci bus id: 0000:01:00.0) [[ 12.]]

当然,也可以直接这么写:

In [1]: import tensorflow as tf
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so.1 locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so locally
In [2]: with tf.device("/cpu:0"): ...: matrix1=tf.constant([[3.,3.]]) ...: matrix2=tf.constant([[2.],[2.]]) ...: In [3]: with tf.device("/gpu:0"): ...: product=tf.matmul(matrix1,matrix2) ...: In [4]: sess=tf.Session()
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties: name: GeForce GTX 970 major: 5 minor: 2 memoryClockRate (GHz) 1.266 pciBusID 0000:01:00.0 Total memory: 4.00GiB Free memory: 3.61GiB I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0: Y I tensorflow/core/common_runtime/gpu/gpu_device.cc:838] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 970, pci bus id: 0000:01:00.0)
In [5]: result=sess.run(product) In [6]: print result [[ 12.]]
In [8]: sess.close()

注意:将节点operation放到 with tf.device(..): 里面,而启动语句或者不需要计算资源的语句放到with的外面

   

支持的设备

在一套标准的系统上通常有多个计算设备. TensorFlow 支持 CPU 和 GPU 这两种设备. 我们用指定字符串 strings 来标识这些设备. 比如:

  • "/cpu:0": 机器中的 CPU
  • "/gpu:0": 机器中的 GPU, 如果你有一个的话.
  • "/gpu:1": 机器中的第二个 GPU, 以此类推…

如果一个 TensorFlow 的 operation 中兼有 CPU 和 GPU 的实现, 当这个算子被指派设备时, GPU 有优先权. 比如matmul中 CPU 和 GPU kernel 函数都存在. 那么在 cpu:0 和 gpu:0 中, matmul operation 会被指派给 gpu:0 .

记录设备指派情况

为了获取你的 operations 和 Tensor 被指派到哪个设备上运行, 用 log_device_placement 新建一个 session, 并设置为 True.

# 新建一个 graph.
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
# 新建session with log_device_placement并设置为True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# 运行这个 op.
print sess.run(c)

你应该能看见以下输出:

Device mapping:
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: Tesla K40c, pci bus
id: 0000:05:00.0
b: /job:localhost/replica:0/task:0/gpu:0
a: /job:localhost/replica:0/task:0/gpu:0
MatMul: /job:localhost/replica:0/task:0/gpu:0
[[ 22.  28.]
 [ 49.  64.]]

手工指派设备

如果你不想使用系统来为 operation 指派设备, 而是手工指派设备, 你可以用 with tf.device 创建一个设备环境, 这个环境下的 operation 都统一运行在环境指定的设备上.

# 新建一个graph.
with tf.device('/cpu:0'):
  a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
  b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
# 新建session with log_device_placement并设置为True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# 运行这个op.
print sess.run(c)

你会发现现在 a 和 b 操作都被指派给了 cpu:0.

Device mapping:
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: Tesla K40c, pci bus
id: 0000:05:00.0
b: /job:localhost/replica:0/task:0/cpu:0
a: /job:localhost/replica:0/task:0/cpu:0
MatMul: /job:localhost/replica:0/task:0/gpu:0
[[ 22.  28.]
 [ 49.  64.]]

在多GPU系统里使用单一GPU

如果你的系统里有多个 GPU, 那么 ID 最小的 GPU 会默认使用. 如果你想用别的 GPU, 可以用下面的方法显式的声明你的偏好:

# 新建一个 graph.
with tf.device('/gpu:2'):
  a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
  b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
  c = tf.matmul(a, b)
# 新建 session with log_device_placement 并设置为 True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# 运行这个 op.
print sess.run(c)

如果你指定的设备不存在, 你会收到 InvalidArgumentError 错误提示:

InvalidArgumentError: Invalid argument: Cannot assign a device to node 'b':
Could not satisfy explicit device specification '/gpu:2'
   [[Node: b = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [3,2]
   values: 1 2 3...>, _device="/gpu:2"]()]]

为了避免出现你指定的设备不存在这种情况, 你可以在创建的 session 里把参数 allow_soft_placement 设置为 True, 这样 tensorFlow 会自动选择一个存在并且支持的设备来运行 operation.

# 新建一个 graph.
with tf.device('/gpu:2'):
  a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
  b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
  c = tf.matmul(a, b)
# 新建 session with log_device_placement 并设置为 True.
sess = tf.Session(config=tf.ConfigProto(
      allow_soft_placement=True, log_device_placement=True))
# 运行这个 op.
print sess.run(c)

使用多个 GPU

如果你想让 TensorFlow 在多个 GPU 上运行, 你可以建立 multi-tower 结构, 在这个结构 里每个 tower 分别被指配给不同的 GPU 运行. 比如:

# 新建一个 graph.
c = []
for d in ['/gpu:2', '/gpu:3']:
  with tf.device(d):
    a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3])
    b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2])
    c.append(tf.matmul(a, b))
with tf.device('/cpu:0'):
  sum = tf.add_n(c)
# 新建session with log_device_placement并设置为True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# 运行这个op.
print sess.run(sum)

你会看到如下输出:

Device mapping:
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: Tesla K20m, pci bus
id: 0000:02:00.0
/job:localhost/replica:0/task:0/gpu:1 -> device: 1, name: Tesla K20m, pci bus
id: 0000:03:00.0
/job:localhost/replica:0/task:0/gpu:2 -> device: 2, name: Tesla K20m, pci bus
id: 0000:83:00.0
/job:localhost/replica:0/task:0/gpu:3 -> device: 3, name: Tesla K20m, pci bus
id: 0000:84:00.0
Const_3: /job:localhost/replica:0/task:0/gpu:3
Const_2: /job:localhost/replica:0/task:0/gpu:3
MatMul_1: /job:localhost/replica:0/task:0/gpu:3
Const_1: /job:localhost/replica:0/task:0/gpu:2
Const: /job:localhost/replica:0/task:0/gpu:2
MatMul: /job:localhost/replica:0/task:0/gpu:2
AddN: /job:localhost/replica:0/task:0/cpu:0
[[  44.   56.]
 [  98.  128.]]

cifar10 tutorial 这个例子很好的演示了怎样用GPU集群训练.

http://wiki.jikexueyuan.com/project/tensorflow-zh/how_tos/using_gpu.html

https://learningtensorflow.com/lesson10/

Using your GPU

It’s quite simple really. At least, syntactically. Just change this:

# Setup operations with tf.Session() as sess: # Run your code

To this:

with tf.device("/gpu:0"): # Setup operations with tf.Session() as sess: # Run your code

This new line will create a new context manager, telling TensorFlow to perform those actions on the GPU.

Let’s have a look at a concrete example. The below code creates a random matrix with a size given at the command line. We can either run the code on a CPU or GPU using command line options:

import sys import numpy as np import tensorflow as tf from datetime import datetime device_name = sys.argv[1] # Choose device from cmd line. Options: gpu or cpu shape = (int(sys.argv[2]), int(sys.argv[2])) if device_name == "gpu": device_name = "/gpu:0" else: device_name = "/cpu:0" with tf.device(device_name): random_matrix = tf.random_uniform(shape=shape, minval=0, maxval=1) dot_operation = tf.matmul(random_matrix, tf.transpose(random_matrix)) sum_operation = tf.reduce_sum(dot_operation) startTime = datetime.now() with tf.Session(config=tf.ConfigProto(log_device_placement=True)) as session: result = session.run(sum_operation) print(result) # It can be hard to see the results on the terminal with lots of output -- add some newlines to improve readability. print("\n" * 5) print("Shape:", shape, "Device:", device_name) print("Time taken:", datetime.now() - startTime) print("\n" * 5)

You can run this at the command line with:

python matmul.py gpu 1500

This will use the CPU with a matrix of size 1500 squared. Use the following to do the same operation on the CPU:

python matmul.py cpu 1500

The first thing you’ll notice when running GPU-enabled code is a large increase in output, compared to a normal TensorFlow script. Here is what my computer prints out, before it prints out any result from the operations.


I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so.5 locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so.1 locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so locally I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties:  name: GeForce GTX 950M major: 5 minor: 0 memoryClockRate (GHz) 1.124 pciBusID 0000:01:00.0 Total memory: 3.95GiB Free memory: 3.50GiB I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0  I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0: Y  I tensorflow/core/common_runtime/gpu/gpu_device.cc:838] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 950M, pci bus id: 0000:01:00.0) 

If your code doesn’t produce output similar in nature to this, you aren’t running the GPU enabled Tensorflow. Alternatively, if you get an error such as ImportError: libcudart.so.7.5: cannot open shared object file: No such file or directory, then you haven’t installed the CUDA library properly. In this case, you’ll need to go back to follow the instructions for installing CUDA on your system.

Try running the above code on both the CPU and GPU, increasing the number slowly. Start with 1500, then try 3000, then 4500, and so on. You’ll find that the CPU starts taking quite a long time, while the GPU is really, really fast at this operation!

If you have multiple GPUs, you can use either. GPUs are zero-indexed – the above code accesses the first GPU. Changing the device to gpu:1 uses the second GPU, and so on. You can also send part of your computation to one GPU, and part to another GPU. In addition, you can access the CPUs of your machine in a similar way – just use cpu:0 (or another number).

What types of operations should I send to the GPU?

In general, if the step of the process can be described such as “do this mathematical operation thousands of times”, then send it to the GPU. Examples include matrix multiplication and computing the inverse of a matrix. In fact, many basic matrix operations are prime candidates for GPUs. As an overly broad and simple rule, other operations should be performed on the CPU.

There is also a cost to changing devices and using GPUs. GPUs don’t have direct access to the rest of your computer (except, of course for the display). Due to this, if you are running a command on a GPU, you need to copy all of the data to the GPU first, then do the operation, then copy the result back to your computer’s main memory. TensorFlow handles this under the hood, so the code is simple, but the work still needs to be performed.

Not all operations can be done on GPUs. If you get the following error, you are trying to do an operation that can’t be done on a GPU:

Cannot assign a device to node 'PyFunc': Could not satisfy explicit device specification '/device:GPU:1' because no devices matching that specification are registered in this process;

If this is the case, you can either manually change the device to a CPU for this operation, or set TensorFlow to automatically change the device in this case. To do this, set allow_soft_placementtp True in the configuration, done as part of creating the session. The prototype looks like this:

with tf.Session(config=tf.ConfigProto(allow_soft_placement=True)): # Run your graph here

I also recommend logging device placement when using GPUs, at this lets you easily debug issues relating to different device usage. This prints the usage of devices to the log, allowing you to see when devices change and how that affects the graph.

with tf.Session(config=tf.ConfigProto(allow_soft_placement=True, log_device_placement=True)): # Run your graph here
点赞