线上带ofed驱动的实例更换内核方案
概述
线上一些BCC/EBC镜像装有ofed驱动,用户创建带rdma网络功能的套餐的实例后即可直接使用rdma功能。但是ofed是在当前os内核版本下编译的,如果用户改变内核版本后,可能会无法识别具有rdma功能的网卡。这种情况下,就需要重新编译、安装ofed驱动。
本文档提供在不同os的实例上更换内核版本后重新安装ofed的方案。
操作步骤
CentOS 7
以升级内核到kernel-3.10.0-1160.90.1.el7.x86_64为例。
1. 安装内核、内核开发包。在实例中安装kernel-3.10.0-1160.90.1.el7.x86_64版本内核及相关内核开发包:
1yum install -y kernel-3.10.0-1160.90.1.el7.x86_64 \
2 kernel-devel-3.10.0-1160.90.1.el7.x86_64 \
3 kernel-headers-3.10.0-1160.90.1.el7.x86_64 \
4 kernel-tools-3.10.0-1160.90.1.el7.x86_64
2. 检查rdma无损配置包。需要检查实例里是否安装有rdma无损配置包rdma-userspace-config,如有,需要删除该包,等安装好ofed后再重新安装。执行命令如下:
1# 检查是否安装了rdma-userspace-config
2rpm -qa |grep -i rdma-userspace-config
如果检查该无损配置包已经安装,则执行命令进行删除:
1# rpm -e rdma-userspace-config
3. 下载ofed安装包。可以在百度软件源上下载,或者在ofed官网下载后传到实例内:
- 百度软件源下载,以下载5.8-2.0.3版本ofed为例,在实例内执行下面命令下载;
1wget http://mirrors.baidubce.com/mlnx-ofed/5.8-2.0.3.0/MLNX_OFED_LINUX-5.8-2.0.3.0-rhel7.9-x86_64.tgz
- 官网下载,在ofed官网下载对应版本的ofed安装包
4. 编译安装ofed
- 执行以下脚本,编译、安装ofed驱动。
- 整个编译、安装过程要半个小时左右。
注:将脚本和ofed安装包放到同一目录下,脚本中ofed版本和内核版本改为实际使用版本。
1#!/usr/bin/env bash
2# Update mlnx ofed drivers
3# -- Prepare mlnx ofed drivers
4# -- Extract and install
5
6mlnx_ofed_version=5.8-2.0.3.0
7mlnx_ofed_drv=MLNX_OFED_LINUX-${mlnx_ofed_version}-rhel7.9-x86_64.tgz
8kern_ver=3.10.0-1160.90.1.el7.x86_64
9
10if [ ! -d /lib/modules/${kern_ver}/build ]; then
11 echo "There is no kernel build directory. Please check if kernel-devel is installed ..."
12 exit 1
13fi
14if ! which gcc >& /dev/null; then
15 yum install -y gcc
16fi
17if ! which make >& /dev/null; then
18 yum install -y make
19fi
20
21# Install Build Requirement
22yum install -y elfutils-libelf-devel createrepo python-devel redhat-rpm-config rpm-build libtool
23# Install runtime requirement for all OFED components installation
24yum install -y tcl gcc-gfortran fuse-libs tk libnl3-devel
25
26# Install updated mlnx_ofed packages including kernel modules and userspace packages
27mkdir update_drivers
28tar xf ${mlnx_ofed_drv} --strip-components 2 -C update_drivers/
29cd update_drivers
30./mlnxofedinstall --without-fw-update --add-kernel-support -k ${kern_ver} --skip-distro-check \
31 --package-install-options "--force" --distro rhel7.9 -q
32if [ $? -ne 0 ]; then
33 echo "MLNX OFED driver install ... Failed."
34 exit 1
35fi
36cd ..
37
38# 82-net-setup-link.rules cause the name of NIC changed. Then disable this rule.
39if [ -f /usr/lib/udev/rules.d/82-net-setup-link.rules ]; then
40 mv /usr/lib/udev/rules.d/82-net-setup-link.rules /usr/lib/udev/rules.d/82-net-setup-link.rules.orig
41fi
42
43if [ -f /usr/lib/udev/rules.d/83-mlnx-sf-name.rules ]; then
44 mv /usr/lib/udev/rules.d/83-mlnx-sf-name.rules /usr/lib/udev/rules.d/83-mlnx-sf-name.rules.orig
45fi
46
47# Disable rshim(tmfifo_net0)
48systemctl disable rshim.service
49
50rm -rf ${mlnx_ofed_drv} update_drivers
51rm -rf /tmp/MLNX* /tmp/*.conf
52
53# update initramfs
54dracut -f /boot/initramfs-${kern_ver}.img ${kern_ver}
- 如果第二步卸载了rdma-userspace-config,重新安装回来
1wget -q http://mirrors.baidubce.com/baidu/rdma_specs/rdma-userspace-config-bbc-v1.0.3-1.el7.centos.x86_64.rpm
2rpm -ivh rdma-userspace-config-bbc-v1.0.3-1.el7.centos.x86_64.rpm
3service rdma start
- 重启实例,进入新内核;
- 检查安装。
1#检查已安装5.8-2.0.3版本并在当前内核下编译出的ofed
2[root@localhost ~]# rpm -qa | grep mlnx | grep 3.10.0_1160
3mlnx-ofa_kernel-modules-5.8-OFED.5.8.2.0.3.1.kver.3.10.0_1160.90.1.el7.x86_64.x86_64
4mlnx-ofa_kernel-devel-5.8-OFED.5.8.2.0.3.1.kver.3.10.0_1160.90.1.el7.x86_64.x86_64
5knem-modules-1.1.4.90mlnx1-OFED.5.8.0.4.7.1.kver.3.10.0_1160.90.1.el7.x86_64.x86_64
6[root@localhost ~]# modinfo mlx5_core
7filename: /lib/modules/3.10.0-1160.90.1.el7.x86_64/extra/mlnx-ofa_kernel/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.ko
8alias: auxiliary:mlx5_core.eth-rep
9alias: auxiliary:mlx5_core.eth
10basedon: Korg 5.17-rc4
11version: 5.8-2.0.3
12license: Dual BSD/GPL
13description: Mellanox 5th generation network adapters (ConnectX series) core driver
14author: Eli Cohen <eli@mellanox.com>
15retpoline: Y
16rhelversion: 7.9
17srcversion: 6A14E2ECBAE645B024A60B6
18...
19#检查网卡
20[root@localhost ~]# ifconfig
21#检查网卡驱动
22[root@localhost ~]# ethtool -i ethX
23
24#检查软件包依赖,如果缺少依赖包,或者软件包重复,或者其他问题,需要手动解决,安装缺失的依赖包,删除重复的软件包等;
25[root@localhost ~]# yum check dependencies
CentOS 8
以升级内核到kernel-4.18.0-348.7.1.el8_5.x86_64为例。
1. 安装内核、内核开发包。在实例中安装kernel-4.18.0-348.7.1.el8_5.x86_64版本内核及相关内核开发包:
1yum install -y kernel-4.18.0-348.7.1.el8_5.x86_64 \
2 kernel-devel-4.18.0-348.7.1.el8_5.x86_64 \
3 kernel-headers-4.18.0-348.7.1.el8_5.x86_64 \
4 kernel-tools-4.18.0-348.7.1.el8_5.x86_64
2. 检查rdma无损配置包。有些实例里安装有rdma无损配置包rdma-userspace-config。需要删除该包,等安装好ofed后重新安装:
1# 检查是否安装了rdma-userspace-config
2rpm -qa |grep -i rdma-userspace-config
如果检查该无损配置包已经安装,则执行命令进行删除:
1如果存在就卸载该包
2# rpm -e rdma-userspace-config
3. 下载ofed安装包。可以在百度软件源上下载,或者在ofed官网下载后传到实例内:
- 百度软件源下载,以下载5.8-2.0.3版本ofed为例,在实例内执行下面命令下载:
1wget http://mirrors.baidubce.com/mlnx-ofed/5.8-2.0.3.0/MLNX_OFED_LINUX-5.8-2.0.3.0-rhel8.4-x86_64.tgz
- 官网下载,在ofed官网下载对应版本的ofed安装包
4. 编译安装ofed
执行以下脚本,编译、安装ofed驱动。整个编译、安装过程要半个小时左右。
注意,将脚本和ofed安装包放到同一目录下,脚本中ofed版本和内核版本改为实际使用版本。
1#!/usr/bin/env bash
2# Update mlnx ofed drivers
3# -- Prepare mlnx ofed drivers
4# -- Extract and install
5
6mlnx_ofed_version=5.8-2.0.3.0
7mlnx_ofed_drv=MLNX_OFED_LINUX-${mlnx_ofed_version}-rhel8.4-x86_64.tgz
8kern_ver=4.18.0-348.7.1.el8_5.x86_64
9
10if [ ! -d /lib/modules/${kern_ver}/build ]; then
11 echo "There is no kernel build directory. Please check if kernel-devel is installed ..."
12 exit 1
13fi
14if ! which gcc >& /dev/null; then
15 yum install -y gcc
16fi
17if ! which make >& /dev/null; then
18 yum install -y make
19fi
20
21# Install Build Requirement
22yum install -y createrepo python36-devel libtool python36 kernel-rpm-macros gdb-headless rpm-build elfutils-libelf-devel
23# Install runtime requirement for all OFED components installation
24yum install -y tk gcc-gfortran tcsh tcl libnl3-devel perl-Math-Complex cmake-filesystem
25
26# Install updated mlnx_ofed packages including kernel modules and userspace packages
27mkdir update_drivers
28tar xf ${mlnx_ofed_drv} --strip-components 2 -C update_drivers/
29cd update_drivers
30./mlnxofedinstall --without-fw-update --add-kernel-support -k ${kern_ver} --skip-distro-check \
31 --package-install-options "--force" --distro rhel8.4 -q
32if [ $? -ne 0 ]; then
33 echo "MLNX OFED driver install ... Failed."
34 exit 1
35fi
36cd ..
37
38# 82-net-setup-link.rules cause the name of NIC changed. Then disable this rule.
39if [ -f /usr/lib/udev/rules.d/82-net-setup-link.rules ]; then
40 mv /usr/lib/udev/rules.d/82-net-setup-link.rules /usr/lib/udev/rules.d/82-net-setup-link.rules.orig
41fi
42
43if [ -f /usr/lib/udev/rules.d/83-mlnx-sf-name.rules ]; then
44 mv /usr/lib/udev/rules.d/83-mlnx-sf-name.rules /usr/lib/udev/rules.d/83-mlnx-sf-name.rules.orig
45fi
46
47# Disable rshim(tmfifo_net0)
48systemctl disable rshim.service
49
50rm -rf ${mlnx_ofed_drv} update_drivers
51rm -rf /tmp/MLNX* /tmp/*.conf
52
53# update initramfs
54dracut -f /boot/initramfs-${kern_ver}.img ${kern_ver}
- 如果第二步卸载了rdma-userspace-config,重新安装回来
1wget -q http://mirrors.baidubce.com/baidu/rdma_specs/rdma-userspace-config-bbc-v1.0.3-1.el7.centos.x86_64.rpm
2rpm -ivh --nodeps --force rdma-userspace-config-bbc-v1.0.3-1.el7.centos.x86_64.rpm
3service rdma start
- 重启实例,进入新内核;
- 检查安装。
1#检查已安装5.8-2.0.3版本并在当前内核下编译出的ofed
2[root@localhost ~]# rpm -qa | grep mlnx | grep 4.18.0_348
3mlnx-ofa_kernel-devel-5.8-OFED.5.8.2.0.3.1.kver.4.18.0_348.7.1.el8_5.x86_64.x86_64
4knem-modules-1.1.4.90mlnx1-OFED.5.8.0.4.7.1.kver.4.18.0_348.7.1.el8_5.x86_64.x86_64
5mlnx-ofa_kernel-modules-5.8-OFED.5.8.2.0.3.1.kver.4.18.0_348.7.1.el8_5.x86_64.x86_64
6[root@localhost ~]# modinfo mlx5_core
7filename: /lib/modules/4.18.0-348.7.1.el8_5.x86_64/extra/mlnx-ofa_kernel/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.ko
8alias: auxiliary:mlx5_core.eth-rep
9alias: auxiliary:mlx5_core.eth
10basedon: Korg 5.17-rc4
11version: 5.8-2.0.3
12license: Dual BSD/GPL
13description: Mellanox 5th generation network adapters (ConnectX series) core driver
14author: Eli Cohen <eli@mellanox.com>
15rhelversion: 8.5
16srcversion: AED21A09CA345D254692F69
17...
18#检查网卡
19[root@localhost ~]# ifconfig
20#检查网卡驱动
21[root@localhost ~]# ethtool -i ethX
22#检查软件包依赖,如果缺少依赖包,或者软件包重复,或者其他问题,需要手动解决,安装缺失的依赖包,删除重复的软件包等;
23[root@localhost ~]# yum check
Rocky Linux 8
以升级内核到kernel-4.18.0-477.13.1.el8_8.x86_64为例。
1. 安装内核、内核开发包。在实例中安装kernel-4.18.0-477.13.1.el8_8.x86_64版本内核及相关内核开发包:
1yum install -y kernel-4.18.0-477.13.1.el8_8.x86_64 \
2 kernel-devel-4.18.0-477.13.1.el8_8.x86_64 \
3 kernel-headers-4.18.0-477.13.1.el8_8.x86_64 \
4 kernel-tools-4.18.0-477.13.1.el8_8.x86_64
2. 检查rdma无损配置包。有些实例里安装有rdma无损配置包rdma-userspace-config。需要删除该包,等安装好ofed后重新安装:
1# 检查是否安装了rdma-userspace-config
2rpm -qa |grep -i rdma-userspace-config
如果检查该无损配置包已经安装,则执行命令进行删除:
1如果存在就卸载该包
2# rpm -e rdma-userspace-config
3. 下载ofed安装包。可以在百度软件源上下载,或者在ofed官网下载后传到实例内:
- 百度软件源下载,以下载5.8-2.0.3版本ofed为例,在实例内执行下面命令下载
1wget http://mirrors.baidubce.com/mlnx-ofed/5.8-2.0.3.0/MLNX_OFED_LINUX-5.8-2.0.3.0-rhel8.7-x86_64.tgz
- 官网下载,在ofed官网下载对应版本的ofed安装包
4. 编译安装ofed
执行以下脚本,编译、安装ofed驱动。
注意,将脚本和ofed安装包放到同一目录下,脚本中ofed版本和内核版本改为实际使用版本。
整个编译、安装过程要半个小时左右。
1#!/usr/bin/env bash
2# Update mlnx ofed drivers
3# -- Prepare mlnx ofed drivers
4# -- Extract and install
5
6mlnx_ofed_version=5.8-2.0.3.0
7mlnx_ofed_drv=MLNX_OFED_LINUX-${mlnx_ofed_version}-rhel8.7-x86_64.tgz
8kern_ver=4.18.0-477.13.1.el8_8.x86_64
9
10if [ ! -d /lib/modules/${kern_ver}/build ]; then
11 echo "There is no kernel build directory. Please check if kernel-devel is installed ..."
12 exit 1
13fi
14if ! which gcc >& /dev/null; then
15 yum install -y gcc
16fi
17if ! which make >& /dev/null; then
18 yum install -y make
19fi
20
21# Install Build Requirement
22yum install -y createrepo python36-devel libtool python36 kernel-rpm-macros gdb-headless rpm-build elfutils-libelf-devel
23# Install runtime requirement for all OFED components installation
24yum install -y tk gcc-gfortran tcsh tcl libnl3-devel perl-Math-Complex cmake-filesystem
25
26# Install updated mlnx_ofed packages including kernel modules and userspace packages
27mkdir update_drivers
28tar xf ${mlnx_ofed_drv} --strip-components 2 -C update_drivers/
29cd update_drivers
30./mlnxofedinstall --without-fw-update --add-kernel-support -k ${kern_ver} --skip-distro-check \
31 --package-install-options "--force" --distro rhel8.7 -q
32if [ $? -ne 0 ]; then
33 echo "MLNX OFED driver install ... Failed."
34 exit 1
35fi
36cd ..
37
38# 82-net-setup-link.rules cause the name of NIC changed. Then disable this rule.
39if [ -f /usr/lib/udev/rules.d/82-net-setup-link.rules ]; then
40 mv /usr/lib/udev/rules.d/82-net-setup-link.rules /usr/lib/udev/rules.d/82-net-setup-link.rules.orig
41fi
42
43if [ -f /usr/lib/udev/rules.d/83-mlnx-sf-name.rules ]; then
44 mv /usr/lib/udev/rules.d/83-mlnx-sf-name.rules /usr/lib/udev/rules.d/83-mlnx-sf-name.rules.orig
45fi
46
47# Disable rshim(tmfifo_net0)
48systemctl disable rshim.service
49
50rm -rf ${mlnx_ofed_drv} update_drivers
51rm -rf /tmp/MLNX* /tmp/*.conf
52
53# update initramfs
54dracut -f /boot/initramfs-${kern_ver}.img ${kern_ver}
- 如果第二步卸载了rdma-userspace-config,重新安装回来
1wget -q http://mirrors.baidubce.com/baidu/rdma_specs/rdma-userspace-config-bbc-v1.0.3-1.el7.centos.x86_64.rpm
2rpm -ivh --nodeps --force rdma-userspace-config-bbc-v1.0.3-1.el7.centos.x86_64.rpm
3service rdma start
- 重启实例,进入新内核;
- 检查安装。
1#检查已安装5.8-2.0.3版本并在当前内核下编译出的ofed
2[root@localhost ~]# rpm -qa | grep mlnx | grep 4.18.0_477
3mlnx-ofa_kernel-modules-5.8-OFED.5.8.2.0.3.1.kver.4.18.0_477.13.1.el8_8.x86_64.x86_64
4mlnx-ofa_kernel-devel-5.8-OFED.5.8.2.0.3.1.kver.4.18.0_477.13.1.el8_8.x86_64.x86_64
5knem-modules-1.1.4.90mlnx1-OFED.5.8.0.4.7.1.kver.4.18.0_477.13.1.el8_8.x86_64.x86_64
6[root@localhost ~]# modinfo mlx5_core
7filename: /lib/modules/4.18.0-477.13.1.el8_8.x86_64/extra/mlnx-ofa_kernel/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.ko
8alias: auxiliary:mlx5_core.eth-rep
9alias: auxiliary:mlx5_core.eth
10basedon: Korg 5.17-rc4
11version: 5.8-2.0.3
12license: Dual BSD/GPL
13description: Mellanox 5th generation network adapters (ConnectX series) core driver
14author: Eli Cohen <eli@mellanox.com>
15rhelversion: 8.8
16srcversion: AED21A09CA345D254692F69
17...
18#检查网卡
19[root@localhost ~]# ifconfig
20#检查网卡驱动
21[root@localhost ~]# ethtool -i ethX
22#检查软件包依赖,如果缺少依赖包,或者软件包重复,或者其他问题,需要手动解决,安装缺失的依赖包,删除重复的软件包等;
23[root@localhost ~]# yum check
Rocky Linux 9
以升级内核到kernel-5.14.0-284.11.1.el9_2.x86_64为例。
1. 安装内核、内核开发包。在实例中安装kernel-5.14.0-284.11.1.el9_2.x86_64版本内核及相关内核开发包:
1yum install -y kernel-5.14.0-284.11.1.el9_2.x86_64 \
2 kernel-devel-5.14.0-284.11.1.el9_2.x86_64 \
3 kernel-headers-5.14.0-284.11.1.el9_2.x86_64 \
4 kernel-tools-5.14.0-284.11.1.el9_2.x86_64
2. 检查rdma无损配置包。有些实例里安装有rdma无损配置包rdma-userspace-config。需要删除该包,等安装好ofed后重新安装:
1# 检查是否安装了rdma-userspace-config
2rpm -qa |grep -i rdma-userspace-config
如果检查该无损配置包已经安装,则执行命令进行删除:
1如果存在就卸载该包
2# rpm -e rdma-userspace-config
3. 下载ofed安装包。可以在百度软件源上下载,或者在ofed官网下载后传到实例内:
- 百度软件源下载,以下载5.8-2.0.3版本ofed为例,在实例内执行下面命令下载
1wget http://mirrors.baidubce.com/mlnx-ofed/5.8-2.0.3.0/MLNX_OFED_LINUX-5.8-2.0.3.0-rhel9.1-x86_64.tgz
- 官网下载,在ofed官网下载对应版本的ofed安装包
4. 编译安装ofed
执行以下脚本,编译、安装ofed驱动。
注意,将脚本和ofed安装包放到同一目录下,脚本中ofed版本和内核版本改为实际使用版本。
整个编译、安装过程要半个小时左右。
1#!/usr/bin/env bash
2# Update mlnx ofed drivers
3# -- Prepare mlnx ofed drivers
4# -- Extract and install
5
6mlnx_ofed_version=5.8-2.0.3.0
7mlnx_ofed_drv=MLNX_OFED_LINUX-${mlnx_ofed_version}-rhel9.1-x86_64.tgz
8kern_ver=5.14.0-284.11.1.el9_2.x86_64
9
10if [ ! -d /lib/modules/${kern_ver}/build ]; then
11 echo "There is no kernel build directory. Please check if kernel-devel is installed ..."
12 exit 1
13fi
14if ! which gcc >& /dev/null; then
15 yum install -y gcc
16fi
17if ! which make >& /dev/null; then
18 yum install -y make
19fi
20
21# Install Build Requirement
22yum install -y perl createrepo python3-devel libtool kernel-rpm-macros gdb-headless rpm-build elfutils-libelf-devel
23# Install runtime requirement for all OFED components installation
24yum install -y tk gcc-gfortran tcsh tcl libnl3-devel cmake-filesystem
25
26# Install updated mlnx_ofed packages including kernel modules and userspace packages
27mkdir update_drivers
28tar xf ${mlnx_ofed_drv} --strip-components 2 -C update_drivers/
29cd update_drivers
30./mlnxofedinstall --without-fw-update --add-kernel-support -k ${kern_ver} --skip-distro-check \
31 --package-install-options "--force" --distro rhel9.1 -q
32if [ $? -ne 0 ]; then
33 echo "MLNX OFED driver install ... Failed."
34 exit 1
35fi
36cd ..
37
38# 82-net-setup-link.rules cause the name of NIC changed. Then disable this rule.
39if [ -f /usr/lib/udev/rules.d/82-net-setup-link.rules ]; then
40 mv /usr/lib/udev/rules.d/82-net-setup-link.rules /usr/lib/udev/rules.d/82-net-setup-link.rules.orig
41fi
42
43if [ -f /usr/lib/udev/rules.d/83-mlnx-sf-name.rules ]; then
44 mv /usr/lib/udev/rules.d/83-mlnx-sf-name.rules /usr/lib/udev/rules.d/83-mlnx-sf-name.rules.orig
45fi
46
47# Disable rshim(tmfifo_net0)
48systemctl disable rshim.service
49
50rm -rf ${mlnx_ofed_drv} update_drivers
51rm -rf /tmp/MLNX* /tmp/*.conf
52
53# update initramfs
54dracut -f /boot/initramfs-${kern_ver}.img ${kern_ver}
- 如果第二步卸载了rdma-userspace-config,重新安装回来
1wget -q http://mirrors.baidubce.com/baidu/rdma_specs/rdma-userspace-config-bbc-v1.0.3-1.el7.centos.x86_64.rpm
2rpm -ivh --nodeps --force rdma-userspace-config-bbc-v1.0.3-1.el7.centos.x86_64.rpm
3service rdma start
- 重启实例,进入新内核;
- 检查安装。
1#检查已安装5.8-2.0.3版本并在当前内核下编译出的ofed
2[root@localhost ~]# rpm -qa | grep mlnx | grep 5.14.0_284
3mlnx-ofa_kernel-modules-5.8-OFED.5.8.2.0.3.1.kver.5.14.0_284.11.1.el9_2.x86_64.x86_64
4mlnx-ofa_kernel-devel-5.8-OFED.5.8.2.0.3.1.kver.5.14.0_284.11.1.el9_2.x86_64.x86_64
5knem-modules-1.1.4.90mlnx1-OFED.5.8.0.4.7.1.kver.5.14.0_284.11.1.el9_2.x86_64.x86_64
6[root@localhost ~]# modinfo mlx5_core
7filename: /lib/modules/5.14.0-284.11.1.el9_2.x86_64/extra/mlnx-ofa_kernel/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.ko
8alias: auxiliary:mlx5_core.eth-rep
9alias: auxiliary:mlx5_core.eth
10basedon: Korg 5.17-rc4
11version: 5.8-2.0.3
12license: Dual BSD/GPL
13description: Mellanox 5th generation network adapters (ConnectX series) core driver
14author: Eli Cohen <eli@mellanox.com>
15rhelversion: 9.2
16srcversion: E4CF39F4680CAB323741675
17...
18#检查网卡
19[root@localhost ~]# ifconfig
20#检查网卡驱动
21[root@localhost ~]# ethtool -i ethX
22#检查软件包依赖,如果缺少依赖包,或者软件包重复,或者其他问题,需要手动解决,安装缺失的依赖包,删除重复的软件包等;
23[root@localhost ~]# yum check
Ubuntu 20.04
以升级内核到5.4.0-152-generic为例。
1. 安装内核、内核开发包。在实例中安装5.4.0-152-generic版本内核及其它相关组件:
1apt-get install -y linux-image-5.4.0-152-generic \
2 linux-headers-5.4.0-152 \
3 linux-headers-5.4.0-152-generic \
4 linux-modules-5.4.0-152-generic \
5 linux-modules-extra-5.4.0-152-generic
2. 检查rdma无损配置包。有些实例里安装有rdma无损配置包rdma-userspace-config。需要删除该包,等安装好ofed后重新安装:
1# 检查是否安装了rdma-userspace-config
2rpm -qa |grep -i rdma-userspace-config
如果检查该无损配置包已经安装,则执行命令进行删除:
1如果存在就卸载该包
2# rpm -e rdma-userspace-config
3. 下载ofed安装包。可以在百度软件源上下载,或者在ofed官网下载后传到实例内:
- 百度软件源下载,以下载5.8-2.0.3版本ofed为例,在实例内执行下面命令下载
1wget http://mirrors.baidubce.com/mlnx-ofed/5.8-2.0.3.0/MLNX_OFED_LINUX-5.8-2.0.3.0-ubuntu20.04-x86_64.tgz
- 官网下载,在ofed官网下载对应版本的ofed安装包
4. 编译安装ofed
执行以下脚本,编译、安装ofed驱动。整个编译、安装过程要半个小时左右。
注意,将脚本和ofed安装包放到同一目录下,脚本中ofed版本和内核版本改为实际使用版本。
1#!/usr/bin/env bash
2# Update mlnx ofed drivers
3# -- Prepare mlnx ofed drivers
4# -- Extract and install
5
6mlnx_ofed_version=5.8-2.0.3.0
7mlnx_ofed_drv=MLNX_OFED_LINUX-${mlnx_ofed_version}-ubuntu20.04-x86_64.tgz
8kern_ver=5.4.0-152-generic
9
10if [ ! -d /lib/modules/${kern_ver}/build ]; then
11 echo "There is no kernel build directory. Please check if kernel-devel is installed ..."
12 exit 1
13fi
14
15apt-get update
16if ! which gcc >& /dev/null; then
17 apt-get -y install gcc
18fi
19if ! which make >& /dev/null; then
20 apt-get -y install make
21fi
22
23# Install updated mlnx_ofed packages including kernel modules and userspace packages
24mkdir update_drivers
25tar xf ${mlnx_ofed_drv} --strip-components 2 -C update_drivers/
26pushd update_drivers
27./mlnxofedinstall --without-fw-update --add-kernel-support -k ${kern_ver} --skip-distro-check -q
28if [ $? -ne 0 ]; then
29 ECHO_ERROR "MLNX OFED driver install ... Failed."
30 exit 1
31fi
32popd
33
34# 82-net-setup-link.rules cause the name of NIC changed. Then disable this rule.
35if [ -f /lib/udev/rules.d/82-net-setup-link.rules ]; then
36 mv /lib/udev/rules.d/82-net-setup-link.rules /lib/udev/rules.d/82-net-setup-link.rules.orig
37fi
38
39if [ -f /usr/lib/udev/rules.d/83-mlnx-sf-name.rules ]; then
40 mv /usr/lib/udev/rules.d/83-mlnx-sf-name.rules /usr/lib/udev/rules.d/83-mlnx-sf-name.rules.orig
41fi
42
43# Disable rshim(tmfifo_net0)
44systemctl disable rshim.service
45
46rm -rf ${mlnx_ofed_drv} update_drivers
47rm -rf /tmp/MLNX* /tmp/*.conf
48
49# update initramfs
50dracut -f /boot/initrd.img-${kern_ver} ${kern_ver}
- 如果第二步卸载了rdma-userspace-config,重新安装回来
1wget -q http://mirrors.baidubce.com/baidu/rdma_specs/rdma-userspace-config-bbc_1.0.4_x86.deb
2dpkg -i rdma-userspace-config-bbc_1.0.4_x86.deb
3service rdma start
- 重启实例,进入新内核;
- 检查安装。
1#检查已安装5.8-2.0.3版本并在当前内核下编译出的ofed
2[root@localhost ~]# dpkg -l | grep mlnx | grep 5.4.0-152
3ii knem-modules 1.1.4.90mlnx1-OFED.5.8.0.4.7.1.kver.5.4.0-152-generic amd64 kernel module for high-performance intra-node MPI communication for large messages
4ii mlnx-ofed-kernel-modules 5.8-OFED.5.8.2.0.3.1.kver.5.4.0-152-generic amd64 mlnx-ofed kernel modules
5ii mlnx-ofed-kernel-utils 5.8-OFED.5.8.2.0.3.1.kver.5.4.0-152-generic amd64 Userspace tools to restart and tune mlnx-ofed kernel modules
6[root@localhost ~]# modinfo mlx5_core
7filename: /lib/modules/5.4.0-152-generic/updates/dkms/mlx5_core.ko
8alias: auxiliary:mlx5_core.eth-rep
9alias: auxiliary:mlx5_core.eth
10basedon: Korg 5.17-rc4
11version: 5.8-2.0.3
12license: Dual BSD/GPL
13description: Mellanox 5th generation network adapters (ConnectX series) core driver
14author: Eli Cohen <eli@mellanox.com>
15srcversion: E9B015CBD3F471BDD31CB24
16...
17#检查网卡
18[root@localhost ~]# ifconfig
19#检查网卡驱动
20[root@localhost ~]# ethtool -i ethX
21#检查软件包依赖,如果缺少依赖包,或者软件包重复,或者其他问题,需要手动解决,安装缺失的依赖包,删除重复的软件包等;
22[root@localhost ~]# apt-get check
Ubuntu 22.04
以升级内核到5.15.0-75-generic为例。
1. 安装内核、内核开发包。在实例中安装5.15.0-75-generic版本内核及其它相关组件:
1apt-get install -y linux-image-5.15.0-75-generic \
2 linux-headers-5.15.0-75 \
3 linux-headers-5.15.0-75-generic \
4 linux-modules-5.15.0-75-generic \
5 linux-modules-extra-5.15.0-75-generic
- 有些实例里安装有rdma无损配置包rdma-userspace-config。需要删除该包,等安装好ofed后重新安装:
1#检查是否安装了rdma-userspace-config
2dpkg -l |grep -i rdma-userspace-config
3如果存在就卸载该包
4dpkg -r rdma-userspace-config
3. 下载ofed安装包。可以在百度软件源上下载,或者在ofed官网下载后传到实例内:
- 百度软件源下载,以下载5.8-2.0.3版本ofed为例,在实例内执行下面命令下载
1wget http://mirrors.baidubce.com/mlnx-ofed/5.8-2.0.3.0/MLNX_OFED_LINUX-5.8-2.0.3.0-ubuntu22.04-x86_64.tgz
- 官网下载,在ofed官网下载对应版本的ofed安装包
4. 编译安装ofed
执行以下脚本,编译、安装ofed驱动。整个编译、安装过程要半个小时左右。
注意,将脚本和ofed安装包放到同一目录下,脚本中ofed版本和内核版本改为实际使用版本。
1#!/usr/bin/env bash
2# Update mlnx ofed drivers
3# -- Prepare mlnx ofed drivers
4# -- Extract and install
5
6mlnx_ofed_version=5.8-2.0.3.0
7mlnx_ofed_drv=MLNX_OFED_LINUX-${mlnx_ofed_version}-ubuntu22.04-x86_64.tgz
8kern_ver=5.15.0-75-generic
9
10if [ ! -d /lib/modules/${kern_ver}/build ]; then
11 echo "There is no kernel build directory. Please check if kernel-devel is installed ..."
12 exit 1
13fi
14
15apt-get update
16if ! which gcc >& /dev/null; then
17 apt-get -y install gcc
18fi
19if ! which make >& /dev/null; then
20 apt-get -y install make
21fi
22
23# Install required packages for installing MLNX_OFED_LINUX
24apt-get install -y libnl-route-3-200 libnl-route-3-dev libnl-3-dev bison libfuse2 flex gfortran tk libnuma-dev libgfortran5
25
26# Install updated mlnx_ofed packages including kernel modules and userspace packages
27mkdir update_drivers
28tar xf ${mlnx_ofed_drv} --strip-components 2 -C update_drivers/
29pushd update_drivers
30./mlnxofedinstall --without-fw-update --add-kernel-support -k ${kern_ver} --skip-distro-check -q
31if [ $? -ne 0 ]; then
32 ECHO_ERROR "MLNX OFED driver install ... Failed."
33 exit 1
34fi
35popd
36
37# 82-net-setup-link.rules cause the name of NIC changed. Then disable this rule.
38if [ -f /lib/udev/rules.d/82-net-setup-link.rules ]; then
39 mv /lib/udev/rules.d/82-net-setup-link.rules /lib/udev/rules.d/82-net-setup-link.rules.orig
40fi
41
42if [ -f /usr/lib/udev/rules.d/83-mlnx-sf-name.rules ]; then
43 mv /usr/lib/udev/rules.d/83-mlnx-sf-name.rules /usr/lib/udev/rules.d/83-mlnx-sf-name.rules.orig
44fi
45
46# Disable rshim(tmfifo_net0)
47systemctl disable rshim.service
48
49rm -rf ${mlnx_ofed_drv} update_drivers
50rm -rf /tmp/MLNX* /tmp/*.conf
51
52# update initramfs
53dracut -f /boot/initrd.img-${kern_ver} ${kern_ver}
- 如果第二步卸载了rdma-userspace-config,重新安装回来
1wget -q http://mirrors.baidubce.com/baidu/rdma_specs/rdma-userspace-config-bbc_1.0.4_x86.deb
2dpkg -i rdma-userspace-config-bbc_1.0.4_x86.deb
3service rdma start
- 重启实例,进入新内核;
- 检查安装。
1#检查已安装5.8-2.0.3版本并在当前内核下编译出的ofed
2[root@localhost ~]# dpkg -l | grep mlnx | grep 5.15.0-75
3ii knem-modules 1.1.4.90mlnx1-OFED.5.8.0.4.7.1.kver.5.15.0-75-generic amd64 kernel module for high-performance intra-node MPI communication for large messages
4ii mlnx-ofed-kernel-modules 5.8-OFED.5.8.2.0.3.1.kver.5.15.0-75-generic amd64 mlnx-ofed kernel modules
5ii mlnx-ofed-kernel-utils 5.8-OFED.5.8.2.0.3.1.kver.5.15.0-75-generic amd64 Userspace tools to restart and tune mlnx-ofed kernel modules
6[root@localhost ~]# modinfo mlx5_core
7filename: /lib/modules/5.15.0-75-generic/updates/dkms/mlx5_core.ko
8alias: auxiliary:mlx5_core.eth-rep
9alias: auxiliary:mlx5_core.eth
10basedon: Korg 5.17-rc4
11version: 5.8-2.0.3
12license: Dual BSD/GPL
13description: Mellanox 5th generation network adapters (ConnectX series) core driver
14author: Eli Cohen <eli@mellanox.com>
15srcversion: E4CF39F4680CAB323741675
16...
17#检查网卡
18[root@localhost ~]# ifconfig
19#检查网卡驱动
20[root@localhost ~]# ethtool -i ethX
21#检查软件包依赖,如果缺少依赖包,或者软件包重复,或者其他问题,需要手动解决,安装缺失的依赖包,删除重复的软件包等;
22[root@localhost ~]# apt-get check