在Windows10的GPU上跑一段普通的TensorFlow代码报错如下
2019-04-02 09:50:47.986024: I C:\users\nwani\_bazel_nwani\swultrt5\execroot\org_tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1356] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:01:00.0
totalMemory: 11.00GiB freeMemory: 9.10GiB
2019-04-02 09:50:47.991931: I C:\users\nwani\_bazel_nwani\swultrt5\execroot\org_tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1435] Adding visible gpu devices: 0
2019-04-02 09:50:48.667536: I C:\users\nwani\_bazel_nwani\swultrt5\execroot\org_tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-04-02 09:50:48.672794: I C:\users\nwani\_bazel_nwani\swultrt5\execroot\org_tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:929] 0
2019-04-02 09:50:48.675436: I C:\users\nwani\_bazel_nwani\swultrt5\execroot\org_tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:942] 0: N
2019-04-02 09:50:48.678921: I C:\users\nwani\_bazel_nwani\swultrt5\execroot\org_tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8795 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
2019-04-02 09:50:51.208473: E C:\users\nwani\_bazel_nwani\swultrt5\execroot\org_tensorflow\tensorflow\stream_executor\cuda\cuda_event.cc:49] Error polling for event status: failed to query event: CUDA_ERROR_ILLEGAL_INSTRUCTION
2019-04-02 09:50:51.213582: F C:\users\nwani\_bazel_nwani\swultrt5\execroot\org_tensorflow\tensorflow\core\common_runtime\gpu\gpu_event_mgr.cc:208] Unexpected Event status: 1
这是因为Windows版本下,GPU版本的TensorFlow里,tf.one_hot()函数有bug,最简单的解决办法是把这个函数放到CPU上(会影响速度,但不会报错)
with tf.device('/cpu:0'):
b = tf.one_hot(a, 123)
还有一些其他报错也是这个问题,比如这里stackoverflow
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\cuda_driver.cc:1177] could not synchronize on CUDA context: CUDA_ERROR_LAUNCH_FAILED :: No stack trace available
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\cuda_event.cc:49] Error polling for event status: failed to query event: CUDA_ERROR_LAUNCH_FAILED
F c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_util.cc:370] GPU sync failed