Ubuntu16.04安装nvidia驱动 CUDA cuDNN的教程
在深度学习领域,NVIDIA显卡因其强大的计算能力而备受青睐,要让NVIDIA显卡发挥出最大的性能,需要安装相应的驱动和软件库,本文将详细介绍如何在Ubuntu16.04系统上安装NVIDIA驱动、CUDA和cuDNN。
安装NVIDIA驱动
1、禁用Nouveau驱动
在安装NVIDIA驱动之前,需要先禁用Nouveau驱动,Nouveau是开源的NVIDIA显卡驱动,但它的性能不如官方的NVIDIA驱动,可以通过以下命令禁用Nouveau驱动:
sudo vim /etc/modprobe.d/blacklist-nouveau.conf
在打开的文件中添加以下内容:
blacklist nouveau options nouveau modeset=0
保存并退出,然后重启计算机:
sudo reboot
2、安装NVIDIA驱动
重启后,打开终端,输入以下命令安装NVIDIA驱动:
sudo add-apt-repository ppa:graphics-drivers/ppa sudo apt-get update sudo apt-get install nvidia-384 nvidia-prime
安装完成后,重启计算机:
sudo reboot
安装CUDA
1、下载CUDA Toolkit
访问NVIDIA官网(https://developer.nvidia.com/cuda-downloads)下载适合Ubuntu16.04系统的CUDA Toolkit,选择对应的版本后,点击“Download”按钮,选择“Local”方式下载,下载完成后,解压文件到指定目录。
2、配置环境变量
打开终端,输入以下命令配置环境变量:
export PATH=/usr/local/cuda-8.0/bin${PATH:+:${PATH}} export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
注意:请根据实际下载的CUDA版本修改上述命令中的“8.0”。
3、验证CUDA安装
输入以下命令验证CUDA是否安装成功:
nvcc --version
如果显示出CUDA的版本信息,说明CUDA已经安装成功。
安装cuDNN
1、下载cuDNN库
访问NVIDIA官网(https://developer.nvidia.com/rdp/cudnn-archive)下载适合CUDA版本的cuDNN库,选择对应的版本后,点击“Download”按钮,选择“Local”方式下载,下载完成后,解压文件到指定目录。
2、配置环境变量
打开终端,输入以下命令配置环境变量:
export LD_LIBRARY_PATH=/usr/local/cuda-8.0/extras/CUPTI/lib64:$LD_LIBRARY_PATH export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64:$LD_LIBRARY_PATH export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib:$LD_LIBRARY_PATH
注意:请根据实际下载的cuDNN版本和CUDA版本修改上述命令中的“8.0”,确保将cuDNN库的路径添加到LD_LIBRARY_PATH
环境变量中。
测试安装结果
1、编译一个简单程序测试CUDA和cuDNN是否安装成功,创建一个名为vector_add.cu
的文件,内容如下:
```c++
include <iostream>
include <cuda.h>
include <cudnn.h>
include <curand.h>
include <stdio.h>
include <assert.h>
define BLOCKSIZE 16 // The size of the block to be used for parallel computations. Must be a multiple of warp size (16). Adjust as needed for your system and kernel configuration. define NX 1024 // Number of elements in x dimension define NY 1024 // Number of elements in y dimension define NZ 1024 // Number of elements in z dimension define BX 32 // Number of threads per block in x dimension define BY 32 // Number of threads per block in y dimension define BZ 32 // Number of threads per block in z dimension __global__ void vectorAdd(float *x, float *y, float *z) { int bx = blockIdx.x; int by = blockIdx.y; int bz = blockIdx.z; int tx = threadIdx.x; int ty = threadIdx.y; int tz = threadIdx.z; int index = bx + by * NX + bz * NX * NY; if (index < NX * NY * NZ) { z[index] = x[index] + y[index]; } } int main() { float *x, *y, *z; float h_x[NX*NY*NZ], h_y[NX*NY*NZ], h_z[NX*NY*NZ]; int size = NX*NY*NZ*sizeof(float); // Allocate device memory for the input vectors x and y on the device cudaMalloc((void )&x, size); cudaMalloc((void )&y, size); // Allocate host memory for the input vectors x and y on the host cudaMemcpy(x, h_x, size, cudaMemcpyHostToDevice); cudaMemcpy(y, h_y, size, cudaMemcpyHostToDevice); // Allocate device memory for the output vector z on the device cudaMalloc((void **)&z, size); // Call the vector addition kernel on the device vectorAdd<<<BLOCKSIZE, BX, BY, BZ>>>(x, y, z); // Allocate host memory for the output vector z on the host cudaMemcpy(h_z, z, size, cudaMemcpyDeviceToHost); // Print the result for (int i = 0; i < NX*NY*NZ; i++) { std::cout << h_z[i] << " "; if ((i+1) % (NX*NY) == 0) { std::cout << std::endl; } } return 0; } ```
原创文章,作者:K-seo,如若转载,请注明出处:https://www.kdun.cn/ask/326475.html