centos7安装tensorflow-gpu版本

2023年3月16日 154次阅读来源: sudop

安装anaconda,建议下载anaconda3 4.2版本,默认python 3.5版本,去清华镜像下载,速度快
wget https://mirrors.tuna.tsinghua.edu.cn/anaconda/archive/Anaconda3-4.2.0-Linux-x86_64.sh
安装anaconda3,执行命令:
- 安装命令 bash Anaconda3-4.2.0-Linux-x86_64.sh
- 根据提示,输入enter,根据提示输入yes 同意license agreement
- 指定安装路径,可以直接输入enter使用默认安装路径,可以输入自定义路径/work/anaconda3 然后按enter
- 根据提示输入yes，安装结束
- 此时Anaconda并未安装完成，若在终端输入python将会发现依然是Centos自带的python版本，这是因为.bashrc的更新还没有生效，执行source ~/.bashrc命令使其生效即可

验证python版本

执行命令:python便可看到python及anaconda版本信息

Python 3.5.2 |Anaconda 4.2.0 (64-bit)| (default, Jul  2 2016, 17:53:06)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>

验证python执行OK
```
>>> print("hello world!")
hello world!
```

安装显卡驱动

检查是否电脑配置有Nvidia显卡

$ /usr/sbin/lspci | grep -i nvidia
    执行结果:
    3b:00.0 3D controller: NVIDIA Corporation GP102GL [Tesla P40] (rev a1)
    d8:00.0 3D controller: NVIDIA Corporation GP102GL [Tesla P40] (rev a1)

屏蔽默认带有的nouveau
打开/lib/modprobe.d/dist-blacklist.conf将nvidiafb注释掉。
#blacklist nvidiafb,然后添加以下语句：
```
blacklist nouveau
options nouveau modeset=0
```

重建initramfs image步骤

mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.bak
dracut /boot/initramfs-$(uname -r).img $(uname -r)

修改运行级别为文本模式
systemctl set-default multi-user.target
查看nouveau是否已经禁用
ls mod | grep nouveau 如果没有显示相关内容,说明禁用成功
修改运行级别回图形模式 systemctl set-default graphical.target
安装nvidia-detect命令,从ELRepo源安装
添加源:
- centos-7 rpm -Uvh http://www.elrepo.org/elrepo-release-7.0-2.el7.elrepo.noarch.rpm
- CentOS-6 rpm -Uvh http://www.elrepo.org/elrepo-release-6-6.el6.elrepo.noarch.rpm
- CentOS-5 rpm -Uvh http://www.elrepo.org/elrepo-release-5-5.el5.elrepo.noarch.rpm
安装:yum install nvidia-detect

检查显卡驱动信息:nvidia-detect -v

Probing for supported NVIDIA devices...
[10de:1b38] NVIDIA Corporation GP102GL [Tesla P40]
This device requires the current 390.48 NVIDIA driver kmod-nvidia
[10de:1b38] NVIDIA Corporation GP102GL [Tesla P40]
This device requires the current 390.48 NVIDIA driver kmod-nvidia
[102b:0536] Matrox Electronics Systems Ltd. Device 0536
WARNING: Xorg log file /var/log/Xorg.0.log does not exist
WARNING: Unable to determine Xorg ABI compatibility
WARNING: The driver for this device does not support the current Xorg version

390.48为需安装的显卡版本号,也可以去英伟达官网,下载驱动安装,因为这个版本去yum源里面没有搜索到,我直接去英伟达官网下载cuda相应的驱动
安装地址
tensorflow 1.7版本支持cuda 9.0,因此下载对应cuda9.0的驱动版本
驱动链接

yum install -y "kernel-devel-uname-r == $(uname -r)"
yum install gcc gcc-c++ 安装gcc、g++编译器
安装驱动:sh NVIDIA-Linux-x86_64-384.125.run
驱动安装成功后使用 nvidia-smi命令查看显卡信息

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.46                 Driver Version: 390.46                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla P40           Off  | 00000000:3B:00.0 Off |                    0 |
| N/A   29C    P0    50W / 250W |      0MiB / 22919MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla P40           Off  | 00000000:D8:00.0 Off |                    0 |
| N/A   32C    P0    50W / 250W |      0MiB / 22919MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

重新启动, 使用root用户登陆
reboot

5 .cuda安装 tensorflow1.7支持cuda 9.0,因此需要下载对应的版本

安装地址

安装过程:

rpm -i cuda-repo-rhel7-9-0-local-9.0.176-1.x86_64-rpm
yum clean all
yum install cuda

6 .cudnn安装,注册,下载对应版本的cudnn

解压cudnn文件,并将cudnn文件复制到cuda目录

tar -zxvf cudnn-9.0-linux-x64-v7.1.solitairetheme8
sudo cp cuda/include/cudnn.h /usr/local/cuda/include/
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64/ -d
sudo chmod a+r /usr/local/cuda/include/cudnn.h
sudo chmod a+r /usr/local/cuda/lib64/libcudnn*

安装tensorflow-gpu

前面已经安装anaconda3,直接使用pip命令安装tensorflow

pip install tensorflow-gpu  #默认安装最新版本tensorflow-gpu版本

验证tensorflow-gpu安装是否成功

    # python
        Python 3.5.2 |Anaconda 4.2.0 (64-bit)| (default, Jul  2 2016, 17:53:06) 
    [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import tensorflow as tf
    >>> hello = tf.constant('Hello, Tensorflow')
    >>> sess = tf.Session()
    2018-04-09 09:58:07.326972: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
    2018-04-09 09:58:09.165200: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1344] Found device 0 with properties: 
    name: Tesla P40 major: 6 minor: 1 memoryClockRate(GHz): 1.531
    pciBusID: 0000:3b:00.0
    totalMemory: 22.38GiB freeMemory: 22.21GiB
    2018-04-09 09:58:09.383838: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1344] Found device 1 with properties: 
    name: Tesla P40 major: 6 minor: 1 memoryClockRate(GHz): 1.531
    pciBusID: 0000:d8:00.0
    totalMemory: 22.38GiB freeMemory: 22.21GiB
    2018-04-09 09:58:09.387220: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0, 1
    2018-04-09 09:58:10.093199: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix:
    2018-04-09 09:58:10.093276: I tensorflow/core/common_runtime/gpu/gpu_device.cc:917]      0 1 
    2018-04-09 09:58:10.093289: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0:   N Y 
    2018-04-09 09:58:10.093300: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 1:   Y N 
    2018-04-09 09:58:10.094592: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 21559 MB memory) -> physical GPU (device: 0, name: Tesla P40, pci bus id: 0000:3b:00.0, compute capability: 6.1)
    2018-04-09 09:58:10.352224: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 21559 MB memory) -> physical GPU (device: 1, name: Tesla P40, pci bus id: 0000:d8:00.0, compute capability: 6.1)
    >>> print(sess.run(hello))
    b'Hello, Tensorflow'

tensorflow-gpu版本安装成功,终于完成

异常

ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory

cuda版本不对,tensorflow1.7支持cuda8.0~9.0版本,重新安装cuda即可

libcudnn.so.7: cannot open shared object file: No such file or directory

cuda的路径可能设置错了

sudo ldconfig /usr/local/cuda/lib64

Unable to find the kernel source tree for the currently running kernel. Please make sure you have installed the kernel source files for your kernel and that they are properly configured; on Red Hat Linux systems, for
example, be sure you have the ‘kernel-source’ or ‘kernel-devel’ RPM installed. If you know the correct kernel source files are installed, you may specify the kernel source path with the ‘–kernel-source-path’ command line
option.

kernel-devel 版本不对,使用此命令安装yum install -y "kernel-devel-uname-r == $(uname -r)"

参考文章:

https://blog.csdn.net/Oh_My_Fish/article/details/78861867
https://www.cnblogs.com/kluan/p/4823152.html
<hr />

    原文作者：sudop
    原文地址: https://www.jianshu.com/p/78a936c27ec4
    本文转自网络文章，转载此文章仅为分享知识，如有侵权，请联系博主进行删除。