Ubuntu20.04如何安装CUDA
安装显卡驱动
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 exxk@exxk:~$ sudo apt update && sudo apt upgrade -y exxk@exxk:~$ sudo apt install -y curl wget gnupg lsb-release exxk@exxk:~$ sudo add-apt-repository ppa:graphics-drivers/ppa exxk@exxk:~$ sudo apt update exxk@exxk:~$ sudo apt install -y ubuntu-drivers-common exxk@exxk:~$ ubuntu-drivers devices ERROR:root:could not open aplay -l Traceback (most recent call last): File "/usr/share/ubuntu-drivers-common/detect/sl-modem.py" , line 35, in detect aplay = subprocess.Popen( File "/usr/lib/python3.8/subprocess.py" , line 858, in __init__ self._execute_child(args, executable, preexec_fn, close_fds, File "/usr/lib/python3.8/subprocess.py" , line 1704, in _execute_child raise child_exception_type(errno_num, err_msg, err_filename) FileNotFoundError: [Errno 2] No such file or directory: 'aplay' == /sys/devices/pci0000:00/0000:00:10.0 == modalias : pci:v000010DEd000025E2sv000017AAsd0000382Dbc03sc00i00 vendor : NVIDIA Corporation driver : nvidia-driver-470 - distro non-free driver : nvidia-driver-535-server - distro non-free driver : nvidia-driver-535 - distro non-free driver : nvidia-driver-550-open - third-party non-free driver : nvidia-driver-535-open - distro non-free driver : nvidia-driver-550 - third-party non-free driver : nvidia-driver-560 - third-party non-free recommended driver : nvidia-driver-535-server-open - distro non-free driver : nvidia-driver-545-open - third-party non-free driver : nvidia-driver-470-server - distro non-free driver : nvidia-driver-555-open - third-party non-free driver : nvidia-driver-545 - third-party non-free driver : nvidia-driver-555 - third-party non-free driver : nvidia-driver-560-open - third-party non-free driver : xserver-xorg-video-nouveau - distro free builtin exxk@exxk:~$ sudo apt install -y nvidia-driver-560 --no-install-recommends exxk@exxk:~$ sudo reboot exxk@exxk:~$ nvidia-smi Tue Nov 19 07:27:03 2024 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 560.35.03 Driver Version: 560.35.03 CUDA Version: 12.6 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA GeForce RTX 3050 ... Off | 00000000:00:10.0 Off | N/A | | N/A 45C P8 3W / 60W | 15MiB / 4096MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | 0 N/A N/A 933 G /usr/lib/xorg/Xorg 4MiB | +-----------------------------------------------------------------------------------------+
安装docker(可以在安装系统时勾选上docker,这一步就可以省略,建议不要省略,安装系统勾选安装的docker 版本较低)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 exxk@exxk:~$ sudo apt install -y apt-transport-https ca-certificates curl software-properties-common exxk@exxk:~$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg exxk@exxk:~$ echo "deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null exxk@exxk:~$ sudo apt update exxk@exxk:~$ sudo apt install -y docker-ce docker-ce-cli containerd.io exxk@exxk:~$ sudo usermod -aG docker $USER exxk@exxk:~$ sudo reboot exxk@exxk:~$ docker version Client: Docker Engine - Community Version: 27.3.1 API version: 1.47 Go version: go1.22.7 Git commit: ce12230 Built: Fri Sep 20 11:41:03 2024 OS/Arch: linux/amd64 Context: default Server: Docker Engine - Community Engine: Version: 27.3.1 API version: 1.47 (minimum version 1.24) Go version: go1.22.7 Git commit: 41ca978 Built: Fri Sep 20 11:41:03 2024 OS/Arch: linux/amd64 Experimental: false containerd: Version: 1.7.23 GitCommit: 57f17b0a6295a39009d861b89e3b3b87b005ca27 runc: Version: 1.1.14 GitCommit: v1.1.14-0-g2c9f560 docker-init: Version: 0.19.0 GitCommit: de40ad0
安装 NVIDIA Container Toolkit
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 exxk@exxk:~$ curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \ && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \ sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \ sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list exxk@exxk:~$ sudo sed -i -e '/experimental/ s/^#//g' /etc/apt/sources.list.d/nvidia-container-toolkit.list exxk@exxk:~$ sudo apt-get update exxk@exxk:~$ sudo apt-get install -y nvidia-container-toolkit exxk@exxk:~$ sudo nvidia-ctk runtime configure --runtime=docker exxk@exxk:~$ sudo systemctl restart docker exxk@exxk:~$ docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu20.04 nvidia-smi Tue Nov 19 07:58:08 2024 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 560.35.03 Driver Version: 560.35.03 CUDA Version: 12.6 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA GeForce RTX 3050 ... Off | 00000000:00:10.0 Off | N/A | | N/A 39C P8 3W / 60W | 15MiB / 4096MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| +-----------------------------------------------------------------------------------------+
CUDA使用 镜像如何选择 NVIDIA 提供了官方的 nvidia/cuda Docker 镜像,用于构建和运行 GPU 加速的容器化应用。
镜像说明: nvidia/cuda:<CUDA版本>--<镜像类型>-<基础系统>
镜像类型:
名称
标识
介绍
大小
基础镜像
nvidia/cuda:-base
不包含运行时库,只包含基本的 CUDA 工具链(不包含运行时库,只包含基本的 CUDA 工具链)
100MB
运行时镜像
nvidia/cuda:-runtime
部署已经编译好的应用程序(不包含开发工具)
>2G
开发镜像
nvidia/cuda:-devel
开发和调试 CUDA 应用程序
>3G
完整镜像
nvidia/cuda:-full
包含开发镜像的所有内容,并添加了额外的示例代码和文档
没有
CUDA版本:运行和驱动对应版本,一般可以向下兼容,根据我的驱动推荐使用 CUDA 12.6(与您的驱动版本一致),也可以使用 CUDA 12.x 或 11.x
cuDNN:包含 NVIDIA 的 cuDNN 库(深度学习加速库),这是深度学习框架(如 TensorFlow、PyTorch)的核心组件
基础系统:推荐使用宿主机的系统,例如:宿主机是Ubuntu就Ubuntu,宿主机 CentOS就 CentOS
根据我的环境,推荐镜像 :docker pull nvidia/cuda:12.6.2-cudnn-devel-ubuntu20.04
使用 1 2 3 4 5 6 7 8 9 docker run -it --gpus all --name cuda-dev \ -v /home/exxk/workspace:/workspace \ -w /workspace \ -p 8888:8888 \ nvidia/cuda:12.6.2-cudnn-devel-ubuntu20.04 bash root@287c0288ad93:/workspace# apt update root@287c0288ad93:/workspace# apt install -y python3 python3-pip