rocky9使用kk安装arm的k8s

环境说明

类型 版本
物理机 ARM
基础系统 Rocky-9.5-aarch64-minimal.iso
kk(kubekey) 3.1.9
k8s 1.26.6

在线安装

整合步骤

k8s在线安装

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# 配置DNS(可选,视网络情况)
nmcli connection modify enp3s0 ipv4.dns "192.168.10.200 8.8.8.8"
nmcli connection up enp3s0
# 允许ssh(安装系统时可以勾选允许root用户ssh),及远程env
echo 'AcceptEnv LANG LC_*' | tee -a /etc/ssh/sshd_config.d/01-permitrootlogin.conf
systemctl restart sshd
# 安装必须要的依赖
dnf install -y conntrack socat tar
# 生成配置文件config-sample.yaml,然后进行修改,文件说明放文尾
./kk create config
# 安装harbor私有证书(可选)
sudo cp ghspace-ca.crt /etc/pki/ca-trust/source/anchors/
sudo update-ca-trust extract
sudo mkdir -p /etc/containerd/certs.d/harbor.ghspace.cn
sudo cp ghspace-ca.crt /etc/containerd/certs.d/harbor.ghspace.cn/ca.crt
# 国内环境
export KKZONE=cn
# 安装集群
./kk create cluster -f config-sample.yaml

nfs在线安装

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# 安装nfs
dnf install -y nfs-utils
# 创建挂载目录
mkdir -p /data/nfs
# lsblk查看硬盘,格式化sda
mkfs.ext4 /dev/sda
# 挂载
mount /dev/sda /data/nfs
# 永久挂载
echo "/dev/sda /data/nfs ext4 defaults 0 0" | sudo tee -a /etc/fstab
# 配置nfs,允许192.168.10.0网段访问
echo "/data/nfs 192.168.10.0/24(rw,sync,no_root_squash)" | sudo tee -a /etc/exports
# 启用nfs
systemctl enable --now nfs-server
# 放行防火墙
firewall-cmd --permanent --add-service=nfs
firewall-cmd --permanent --add-service=mountd
firewall-cmd --permanent --add-service=rpc-bind
firewall-cmd --reload

k8s安装 NFS CSI 驱动

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
helm repo add csi-driver-nfs https://raw.githubusercontent.com/kubernetes-csi/csi-driver-nfs/master/charts
helm repo update
# 国内大概率下载镜像失败
helm install csi-driver-nfs csi-driver-nfs/csi-driver-nfs --namespace kube-system
# 查看是否安装成功
kubectl --namespace=kube-system get pods --selector="app.kubernetes.io/instance=csi-driver-nfs" --watch
# 查看pod启动详情
kubectl -n kube-system describe pod csi-nfs-node-xxx
# 卸载,重新安装
helm uninstall csi-driver-nfs -n kube-system
# 下载模板
helm template csi-driver-nfs csi-driver-nfs/csi-driver-nfs --namespace kube-system > nfs-driver.yaml
# 查看支持那些参数和变量
helm show values csi-driver-nfs/csi-driver-nfs
# 查看模板里面的镜像
grep 'image:' nfs-driver.yaml
image: "registry.k8s.io/sig-storage/livenessprobe:v2.15.0"
image: "registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.13.0"
image: "registry.k8s.io/sig-storage/nfsplugin:v4.11.0"
image: "registry.k8s.io/sig-storage/csi-provisioner:v5.2.0"
image: "registry.k8s.io/sig-storage/csi-resizer:v1.13.1"
image: "registry.k8s.io/sig-storage/csi-snapshotter:v8.2.0"
image: "registry.k8s.io/sig-storage/livenessprobe:v2.15.0"
image: "registry.k8s.io/sig-storage/nfsplugin:v4.11.0"
# 解决镜像拉取不下来
# 先从其他地方拉取到镜像,然后上传到k8s的私有仓库

离线安装

离线制作

k8s artifact制作

  1. 拷贝$HOME/.kube/config到kk的机器上(kk本来就在集群节点上面忽略这个步骤)
  2. 添加域名映射,因为config里面是用的安装时候配置的域名,因此需要执行echo "192.168.10.219 lb.k8s.local" >> /etc/hosts,具体是什么值可以在k8s的节点的hosts文件看到(kk本来就在集群节点上面忽略这个步骤)
  3. 生成 manifest-sample.yaml,执行./kk create manifest --kubeconfig config
  4. 导出kubekey-artifact.tar.gz执行 export KKZONE=cn./kk artifact export -m manifest-sample.yaml

k8s 系统依赖制作

  1. 在k8s节点执行,生成离线包kk-rpms.tar.gz,带repodata/元数据和rpm包。

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    # 安装 createrepo 工具(生成本地 yum 仓库时需要),dnf-plugins-core 下载包用,可能系统已经自带
    dnf install -y createrepo dnf-plugins-core
    # 创建目录保存依赖包
    mkdir -p ~/kk-rpms
    cd ~/kk-rpms
    # 下载 conntrack socat tar 及其依赖
    dnf download --resolve --alldeps conntrack socat tar
    # 不建议,容易丢失依赖 如果没有依赖,或者不想下载依赖,注意这里时用reinstall,不是install,因为机器上已经安装了这些包
    # dnf reinstall --downloadonly --downloaddir=. conntrack socat tar
    ll # 检查下载的 rpm 文件
    # 生成本地 yum 仓库元数据(方便离线使用)
    createrepo .
    ll # 检查生成的repodata目录
    # 打包 rpm 包和 repo 元数据
    cd ..
    tar -czf kk-rpms.tar.gz kk-rpms

自定义整合资源包

  1. 最后将文件整合到如下目录结构

    1
    2
    3
    4
    5
    6
    kk-offline-install
    ├── config-sample.yaml # 集群安装配置文件,按需修改集群ip及主从结构
    ├── kk # kk命令可执行文件
    ├── kk-rpms.tar.gz # rpm离线包
    ├── kubekey-artifact.tar.gz # k8s离线制品文件
    └── manifest-sample.yaml # 离线制品配置清单(用不到留着,用于查看安装包的环境)
  2. 压缩得到最终的安装离线包,执行tar -czf kk-offline-install.tar.gz kk-offline-install

离线安装

  1. 找个跳板机,或者规划安装k8s的其中一台机器,上传kk-offline-install.tar.gz,然后执行解压tar -xzf kk-rpms.tar.gz,注意因为rocky 9.5迷你版本没有tar命令,可以再外部进行解压,再上传。

  2. 在所有的k8s节点配置ssh的远程env

    1
    2
    3
    # 允许ssh(安装系统时可以勾选允许root用户ssh),及远程env
    echo 'AcceptEnv LANG LC_*' | tee -a /etc/ssh/sshd_config.d/01-permitrootlogin.conf
    systemctl restart sshd
  3. 安装系统依赖,解压tar -xzf kk-rpms.tar.gz上传要安装的k8s节点上,放在这个目录~root)目录,然后执行

    1
    2
    3
    4
    # 在要安装的k8s节点上都要执行
    dnf config-manager --add-repo file:///root/kk-rpms
    echo "gpgcheck=0" >> /etc/yum.repos.d/root_kk-rpms.repo
    dnf install -y conntrack socat tar --disablerepo="*" --enablerepo="root_kk-rpms"
  4. 修改仓库配置,如果有外部仓库(支持docker registry和harbor),按实际的填,如果没有修改vi config-sample.yaml如下

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    spec:
    roleGroups:
    registry: #指定私有仓库节点
    - node3
    registry: #离线部署时,仓库必须要配置本地仓库或者外部的镜像仓库,用于存放和拉取镜像
    privateRegistry: "dockerhub.k8s.local" #不要加http
    auths:
    "dockerhub.k8s.local":
    username: admin
    password: Harbor12345
    skipTLSVerify: true # 如果是自签证书,开启跳过验证,或者自己拷贝私有证书,安装到本地机器
    namespaceOverride: kubesphereio #这里必须要覆写,不然会拉取不到镜像
    registryMirrors: []
    insecureRegistries: [] #不要想着用http,kk的默认部署只会暴露https端口
  5. 初始化私有镜像仓库,执行./kk init registry -f config-sample.yaml -a kubekey-artifact.tar.gz

  6. (可选步骤)kk不在k8s其中的一个节点才执行这一步,添加域名映射echo "192.168.10.213 dockerhub.k8s.local" >> /etc/hosts,具体是什么内容可以在其他k8s节点的hosts文件查看。

  7. 推送镜像到私有仓库,执行./kk artifact image push -f config-sample.yaml -a kubekey-artifact.tar.gz

  8. 执行./kk create cluster -f config-sample.yaml -a kubekey-artifact.tar.gz 进行离线集群安装

离线安装优化(自动化脚本)

配置文件附录

config-sample.yaml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
apiVersion: kubekey.kubesphere.io/v1alpha2
kind: Cluster
metadata:
name: sample
spec:
hosts:
- {name: node1, address: 192.168.10.219, internalAddress: 192.168.10.219, user: root, password: "gh@2025", arch: arm64}
- {name: node2, address: 192.168.10.226, internalAddress: 192.168.10.226, user: root, password: "gh@2025", arch: arm64}
- {name: node3, address: 192.168.10.213, internalAddress: 192.168.10.213, user: root, password: "gh@2025", arch: arm64}
roleGroups:
etcd:
- node1
- node2
- node3
control-plane:
- node1
- node2
- node3
worker:
- node1
- node2
- node3
registry:
- node3
controlPlaneEndpoint:
## Internal loadbalancer for apiservers
# internalLoadbalancer: haproxy
domain: lb.k8s.local
address: "192.168.10.219"
port: 6443
kubernetes:
version: v1.26.6
clusterName: cluster.local
autoRenewCerts: true
containerManager: containerd
etcd:
type: kubekey
network:
plugin: calico
kubePodsCIDR: 10.233.64.0/18
kubeServiceCIDR: 10.233.0.0/18
## multus support. https://github.com/k8snetworkplumbingwg/multus-cni
multusCNI:
enabled: false
registry: #离线部署时,仓库必须要配置本地仓库或者外部的镜像仓库,用于存放和拉取镜像
privateRegistry: "dockerhub.k8s.local"
auths:
"dockerhub.k8s.local":
username: admin
password: Harbor12345
skipTLSVerify: true # 如果是自签证书,开启跳过验证
namespaceOverride: kubesphereio #这里必须要覆写,不然会拉取不到镜像
registryMirrors: []
insecureRegistries: []
addons: []

踩坑过程

  1. 无法访问网络,ping www.baidu.com提示找不到。

    原因:DNS的问题

    解决:执行

    1
    2
    nmcli connection modify enp3s0 ipv4.dns "8.8.8.8 183.221.253.100"
    nmcli connection up enp3s0
  2. KK执安装提示该错误:failed to get SSH session: ssh: setenv failed

    原因:Rocky有更细的权限控制

    解决:开启允许setenv,执行

    1
    2
    echo 'AcceptEnv LANG LC_*' | tee -a /etc/ssh/sshd_config.d/01-permitrootlogin.conf
    systemctl restart sshd
  3. kk执安装提示如下错误:

    1
    2
    04:18:13 EDT [ERRO] node1: conntrack is required.
    04:18:13 EDT [ERRO] node1: socat is required.

    原因:Rocky缺少conntracksocat

    解决:安装依赖,执行dnf install -y conntrack socat

  4. kk执安装提示该错误:/bin/bash: line 1: tar: command not found: Process exited with status 127

    原因:Rocky缺少tar

    解决:安装依赖,执行dnf install -y tar

  5. kk执行离线安装时,提示如下错误

    1
    FATA[0000] pulling image: failed to pull and unpack image "dockerhub.k8s.local/kubesphere/pause:3.9": failed to resolve reference "dockerhub.k8s.local/kubesphere/pause:3.9": failed to do request: Head "https://dockerhub.k8s.local/v2/kubesphere/pause/manifests/3.9": tls: failed to verify certificate: x509: certificate signed by unknown authority: Process exited with status 1

    原因:自建的仓库用https不安全,没证书导致

    解决:在 config-sample.yaml中添加

    1
    2
    3
    spec:
    registry:
    insecureRegistries: ["dockerhub.k8s.local"]
  6. 执行推送镜像命令./kk artifact image push -f config-sample.yaml -a kubekey-artifact.tar.gz错误如下:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    Getting image source signatures
    trying to reuse blob sha256:ad5042aba4ea93ceb67882c49eb3fb8b806ffa201c5c6f0f90071702f09a9192 at destination: pinging container registry dockerhub.k8s.local: Get "https://dockerhub.k8s.local/v2/": x509: certificate signed by unknown authority
    20:51:14 EDT success: [LocalHost]
    20:51:14 EDT [CopyImagesToRegistryModule] Push multi-arch manifest to private registry
    20:51:14 EDT message: [LocalHost]
    get manifest list failed by module cache
    20:51:14 EDT failed: [LocalHost]
    error: Pipeline[ArtifactImagesPushPipeline] execute failed: Module[CopyImagesToRegistryModule] exec failed:
    failed: [LocalHost] [PushManifest] exec failed after 1 retries: get manifest list failed by module cache
    # 测试有没有开放80,没有开放80
    [root@192-168-10-87 kk-offline-install]# curl http://dockerhub.k8s.local/v2/_catalog
    curl: (7) Failed to connect to dockerhub.k8s.local port 80: Connection refused
    # 测试https
    [root@192-168-10-87 kk-offline-install]# curl https://dockerhub.k8s.local/v2/_catalog
    curl: (60) SSL certificate problem: unable to get local issuer certificate
    More details here: https://curl.se/docs/sslcerts.html

    curl failed to verify the legitimacy of the server and therefore could not
    establish a secure connection to it. To learn more about this situation and
    how to fix it, please visit the web page mentioned above.

    原因:执行kk的机器没有仓库的私有证书

    解决:

    方案一:将仓库registry节点的/etc/docker/certs.d/dockerhub.k8s.local/ca.crt拷贝到执行kk的机器的/etc/pki/ca-trust/source/anchors/这个目录,然后执行:

    1
    2
    3
    4
    [root@192-168-10-87 anchors]# update-ca-trust extract
    # 验证,没有提示证书问题了
    [root@192-168-10-87 kk-offline-install]# curl https://dockerhub.k8s.local/v2/_catalog
    {"repositories":[]}

    方案二:开启跳过证书校验

    1
    2
    3
    4
    5
    6
    7
    8
    spec:
    registry:
    privateRegistry: "dockerhub.k8s.local"
    auths:
    "dockerhub.k8s.local":
    username: admin
    password: Harbor12345
    skipTLSVerify: true # 如果是自签证书,开启跳过验证
  7. 配置registry启用http仓库时,根据官方文档示例用下面的配置

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    registry:
    privateRegistry: "http://dockerhub.k8s.local"
    auths:
    "dockerhub.k8s.local":
    username: admin
    password: Harbor12345
    plainHTTP: true #启用http
    namespaceOverride: ""
    registryMirrors: []
    insecureRegistries: ["dockerhub.k8s.local"]

    原因:发现并没有用http端口部署

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    [root@node3 ~]# ss -tlnp | grep -E '80|443'
    LISTEN 0 32768 *:443 *:* users:(("registry",pid=5664,fd=3))
    [root@node3 ~]# ^C
    [root@node3 ~]# ps aux | grep registry
    root 5664 0.0 0.1 121796 25028 ? Ssl 21:29 0:00 /usr/local/bin/registry serve /etc/kubekey/registry/config.yaml
    root 6251 0.0 0.0 6116 1920 pts/0 S+ 21:34 0:00 grep --color=auto registry
    [root@node3 ~]# cat /etc/kubekey/registry/config.yaml
    version: 0.1
    log:
    fields:
    service: registry
    storage:
    cache:
    layerinfo: inmemory
    filesystem:
    rootdirectory: /mnt/registry
    http:
    addr: :443
    tls:
    certificate: /etc/ssl/registry/ssl/http:.pem
    key: /etc/ssl/registry/ssl/http:-key.pem

    解决:改回https部署

  8. 运行kk离线安装命令时,提示如下错误

    1
    FATA[0000] pulling image: rpc error: code = NotFound desc = failed to pull and unpack image "dockerhub.k8s.local/kubesphere/pause:3.9": failed to resolve reference "dockerhub.k8s.local/kubesphere/pause:3.9": dockerhub.k8s.local/kubesphere/pause:3.9: not found: Process exited with status 1

    原因:dockerhub.k8s.local/kubesphere/pause:3.9找不到,因为离线镜像时,里面存的实际镜像名是dockerhub.k8s.local/kubesphereio/pause:3.9,多了个io

    解决:修改config-sample.yaml,然后重新执行kk离线安装命令

    1
    2
    3
    spec:
    registry:
    namespaceOverride: kubesphereio #添加名称覆写
  9. 修改manifest-sample.yaml文件,在spec:下添加如下

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    spec:
    preInstall:
    - name: install-local-rpms
    commands:
    - echo "===> 解压本地 RPM 离线包..."
    - mkdir -p /opt/kk-rpms
    - if [ ! -d /opt/kk-rpms/repodata ]; then
    echo "===> 第一次解压 kk-rpms.tar.gz";
    tar -xzf ./kk-rpms.tar.gz -C /opt/kk-rpms;
    else
    echo "===> RPM 仓库已存在,跳过解压";
    fi

    - echo "===> 配置本地 yum 仓库..."
    - if ! dnf repolist | grep -q "kk-local"; then
    echo -e "[kk-local]\nname=KK Offline Repo\nbaseurl=file:///opt/kk-rpms\nenabled=1\ngpgcheck=0" > /etc/yum.repos.d/kk-local.repo;
    else
    echo "===> KK 本地仓库已配置,跳过";
    fi

    - echo "===> 安装 conntrack socat tar(若未安装)..."
    - dnf install -y conntrack socat tar || echo "===> 忽略已安装组件"

    原因:经测试preInstall没用,根本不会执行

    解决:可能需要自己写脚本进行手动操作安装,用sehll会手动输入密码,及同意host key,因此可以采用go写脚本,直接编译可以执行文件,达到一键部署。

  10. 运行kk安装时报错

    1
    2
    3
    4
    5
    6
    7
    8
    9
    #kk安装错误信息
    failed: [node3] [RestartETCD] exec failed after 3 retries: start etcd failed: Failed to exec command: sudo -E /bin/bash -c "systemctl daemon-reload && systemctl restart etcd && systemctl enable etcd"
    Job for etcd.service failed because a timeout was exceeded.
    See "systemctl status etcd.service" and "journalctl -xeu etcd.service" for details.: Process exited with status 1
    #查看etcd详细信息的错误
    [root@192-168-10-30 ~]# journalctl -xeu etcd | tail -50
    Jul 04 10:34:38 node1 etcd[41618]: {"level":"warn","ts":"2025-07-04T10:34:38.880411+0800","caller":"embed/config_logging.go:169","msg":"rejected connection","remote-addr":"192.168.10.31:41466","server-name":"","error":"remote error: tls: bad certificate"}
    Jul 04 10:34:38 node1 etcd[41618]: {"level":"warn","ts":"2025-07-04T10:34:38.886267+0800","caller":"rafthttp/probing_status.go:68","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_RAFT_MESSAGE","remote-peer-id":"6808450bb7874830","rtt":"0s","error":"tls: failed to verify certificate: x509: certificate is valid for 127.0.0.1, ::1, 192.168.10.219, 192.168.10.226, 192.168.10.213, not 192.168.10.31"}
    Jul 04 10:34:38 node1 etcd[41618]: {"level":"warn","ts":"2025-07-04T10:34:38.915113+0800","caller":"embed/config_logging.go:169","msg":"rejected connection","remote-addr":"192.168.10.32:51858","server-name":"","error":"remote error: tls: bad certificate"}

    原因:kk曾经安装过其他集群,加--debug,可以看到,他是从本地复制的,本地还是之前集群的安装证书信息

    1
    2
    3
    4
    5
    6
    [root@192-168-10-87 ~]# ./kk create cluster -f config-sample.yaml --debug
    22:42:41 EDT scp local file /root/kubekey/pki/etcd/node-node1-key.pem to remote /tmp/kubekey/etc/ssl/etcd/ssl/node-node1-key.pem success
    [root@192-168-10-87 ~]# ll /root/kubekey/pki/etcd/
    total 80
    -rw-------. 1 root root 1679 Jul 3 22:42 admin-node1-key.pem
    -rw-r--r--. 1 root root 1375 Jul 3 22:42 admin-node1.pem

    解决:清除和kk脚本的同级目录kubekey