rocky9使用kk安装arm的k8s
环境说明
类型 | 版本 |
---|---|
物理机 | ARM |
基础系统 | Rocky-9.5-aarch64-minimal.iso |
kk(kubekey) | 3.1.9 |
k8s | 1.26.6 |
在线安装
整合步骤
k8s在线安装
1 | # 配置DNS(可选,视网络情况) |
nfs在线安装
1 | # 安装nfs |
k8s安装 NFS CSI 驱动
1 | helm repo add csi-driver-nfs https://raw.githubusercontent.com/kubernetes-csi/csi-driver-nfs/master/charts |
离线安装
离线制作
k8s artifact制作
- 拷贝
$HOME/.kube/config
到kk的机器上(kk本来就在集群节点上面忽略这个步骤) - 添加域名映射,因为config里面是用的安装时候配置的域名,因此需要执行
echo "192.168.10.219 lb.k8s.local" >> /etc/hosts
,具体是什么值可以在k8s的节点的hosts文件看到(kk本来就在集群节点上面忽略这个步骤) - 生成
manifest-sample.yaml
,执行./kk create manifest --kubeconfig config
- 导出
kubekey-artifact.tar.gz
执行export KKZONE=cn
和./kk artifact export -m manifest-sample.yaml
k8s 系统依赖制作
在k8s节点执行,生成离线包
kk-rpms.tar.gz
,带repodata/
元数据和rpm包。1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16# 安装 createrepo 工具(生成本地 yum 仓库时需要),dnf-plugins-core 下载包用,可能系统已经自带
dnf install -y createrepo dnf-plugins-core
# 创建目录保存依赖包
mkdir -p ~/kk-rpms
cd ~/kk-rpms
# 下载 conntrack socat tar 及其依赖
dnf download --resolve --alldeps conntrack socat tar
# 不建议,容易丢失依赖 如果没有依赖,或者不想下载依赖,注意这里时用reinstall,不是install,因为机器上已经安装了这些包
# dnf reinstall --downloadonly --downloaddir=. conntrack socat tar
ll # 检查下载的 rpm 文件
# 生成本地 yum 仓库元数据(方便离线使用)
createrepo .
ll # 检查生成的repodata目录
# 打包 rpm 包和 repo 元数据
cd ..
tar -czf kk-rpms.tar.gz kk-rpms
自定义整合资源包
最后将文件整合到如下目录结构
1
2
3
4
5
6kk-offline-install
├── config-sample.yaml # 集群安装配置文件,按需修改集群ip及主从结构
├── kk # kk命令可执行文件
├── kk-rpms.tar.gz # rpm离线包
├── kubekey-artifact.tar.gz # k8s离线制品文件
└── manifest-sample.yaml # 离线制品配置清单(用不到留着,用于查看安装包的环境)压缩得到最终的安装离线包,执行
tar -czf kk-offline-install.tar.gz kk-offline-install
离线安装
找个跳板机,或者规划安装k8s的其中一台机器,上传
kk-offline-install.tar.gz
,然后执行解压tar -xzf kk-rpms.tar.gz
,注意因为rocky 9.5
迷你版本没有tar
命令,可以再外部进行解压,再上传。在所有的k8s节点配置ssh的远程env
1
2
3# 允许ssh(安装系统时可以勾选允许root用户ssh),及远程env
echo 'AcceptEnv LANG LC_*' | tee -a /etc/ssh/sshd_config.d/01-permitrootlogin.conf
systemctl restart sshd安装系统依赖,解压
tar -xzf kk-rpms.tar.gz
上传要安装的k8s节点上,放在这个目录~
(root
)目录,然后执行1
2
3
4# 在要安装的k8s节点上都要执行
dnf config-manager --add-repo file:///root/kk-rpms
echo "gpgcheck=0" >> /etc/yum.repos.d/root_kk-rpms.repo
dnf install -y conntrack socat tar --disablerepo="*" --enablerepo="root_kk-rpms"修改仓库配置,如果有外部仓库(支持docker registry和harbor),按实际的填,如果没有修改
vi config-sample.yaml
如下1
2
3
4
5
6
7
8
9
10
11
12
13
14spec:
roleGroups:
registry: #指定私有仓库节点
- node3
registry: #离线部署时,仓库必须要配置本地仓库或者外部的镜像仓库,用于存放和拉取镜像
privateRegistry: "dockerhub.k8s.local" #不要加http
auths:
"dockerhub.k8s.local":
username: admin
password: Harbor12345
skipTLSVerify: true # 如果是自签证书,开启跳过验证,或者自己拷贝私有证书,安装到本地机器
namespaceOverride: kubesphereio #这里必须要覆写,不然会拉取不到镜像
registryMirrors: []
insecureRegistries: [] #不要想着用http,kk的默认部署只会暴露https端口初始化私有镜像仓库,执行
./kk init registry -f config-sample.yaml -a kubekey-artifact.tar.gz
(可选步骤)kk不在k8s其中的一个节点才执行这一步,添加域名映射
echo "192.168.10.213 dockerhub.k8s.local" >> /etc/hosts
,具体是什么内容可以在其他k8s节点的hosts文件查看。推送镜像到私有仓库,执行
./kk artifact image push -f config-sample.yaml -a kubekey-artifact.tar.gz
执行
./kk create cluster -f config-sample.yaml -a kubekey-artifact.tar.gz
进行离线集群安装
离线安装优化(自动化脚本)
配置文件附录
config-sample.yaml
1 | apiVersion: kubekey.kubesphere.io/v1alpha2 |
踩坑过程
无法访问网络,
ping www.baidu.com
提示找不到。原因:
DNS
的问题解决:执行
1
2nmcli connection modify enp3s0 ipv4.dns "8.8.8.8 183.221.253.100"
nmcli connection up enp3s0KK
执安装提示该错误:failed to get SSH session: ssh: setenv failed
原因:Rocky有更细的权限控制
解决:开启允许
setenv
,执行1
2echo 'AcceptEnv LANG LC_*' | tee -a /etc/ssh/sshd_config.d/01-permitrootlogin.conf
systemctl restart sshdkk
执安装提示如下错误:1
204:18:13 EDT [ERRO] node1: conntrack is required.
04:18:13 EDT [ERRO] node1: socat is required.原因:Rocky缺少
conntrack
和socat
解决:安装依赖,执行
dnf install -y conntrack socat
kk
执安装提示该错误:/bin/bash: line 1: tar: command not found: Process exited with status 127
原因:Rocky缺少
tar
解决:安装依赖,执行
dnf install -y tar
kk执行离线安装时,提示如下错误
1
FATA[0000] pulling image: failed to pull and unpack image "dockerhub.k8s.local/kubesphere/pause:3.9": failed to resolve reference "dockerhub.k8s.local/kubesphere/pause:3.9": failed to do request: Head "https://dockerhub.k8s.local/v2/kubesphere/pause/manifests/3.9": tls: failed to verify certificate: x509: certificate signed by unknown authority: Process exited with status 1
原因:自建的仓库用https不安全,没证书导致
解决:在
config-sample.yaml
中添加1
2
3spec:
registry:
insecureRegistries: ["dockerhub.k8s.local"]执行推送镜像命令
./kk artifact image push -f config-sample.yaml -a kubekey-artifact.tar.gz
错误如下:1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20Getting image source signatures
trying to reuse blob sha256:ad5042aba4ea93ceb67882c49eb3fb8b806ffa201c5c6f0f90071702f09a9192 at destination: pinging container registry dockerhub.k8s.local: Get "https://dockerhub.k8s.local/v2/": x509: certificate signed by unknown authority
20:51:14 EDT success: [LocalHost]
20:51:14 EDT [CopyImagesToRegistryModule] Push multi-arch manifest to private registry
20:51:14 EDT message: [LocalHost]
get manifest list failed by module cache
20:51:14 EDT failed: [LocalHost]
error: Pipeline[ArtifactImagesPushPipeline] execute failed: Module[CopyImagesToRegistryModule] exec failed:
failed: [LocalHost] [PushManifest] exec failed after 1 retries: get manifest list failed by module cache
# 测试有没有开放80,没有开放80
[root@192-168-10-87 kk-offline-install]# curl http://dockerhub.k8s.local/v2/_catalog
curl: (7) Failed to connect to dockerhub.k8s.local port 80: Connection refused
# 测试https
[root@192-168-10-87 kk-offline-install]# curl https://dockerhub.k8s.local/v2/_catalog
curl: (60) SSL certificate problem: unable to get local issuer certificate
More details here: https://curl.se/docs/sslcerts.html
curl failed to verify the legitimacy of the server and therefore could not
establish a secure connection to it. To learn more about this situation and
how to fix it, please visit the web page mentioned above.原因:执行kk的机器没有仓库的私有证书
解决:
方案一:将仓库
registry
节点的/etc/docker/certs.d/dockerhub.k8s.local/ca.crt
拷贝到执行kk的机器的/etc/pki/ca-trust/source/anchors/
这个目录,然后执行:1
2
3
4[root@192-168-10-87 anchors]# update-ca-trust extract
# 验证,没有提示证书问题了
[root@192-168-10-87 kk-offline-install]# curl https://dockerhub.k8s.local/v2/_catalog
{"repositories":[]}方案二:开启跳过证书校验
1
2
3
4
5
6
7
8spec:
registry:
privateRegistry: "dockerhub.k8s.local"
auths:
"dockerhub.k8s.local":
username: admin
password: Harbor12345
skipTLSVerify: true # 如果是自签证书,开启跳过验证配置registry启用http仓库时,根据官方文档示例用下面的配置
1
2
3
4
5
6
7
8
9
10registry:
privateRegistry: "http://dockerhub.k8s.local"
auths:
"dockerhub.k8s.local":
username: admin
password: Harbor12345
plainHTTP: true #启用http
namespaceOverride: ""
registryMirrors: []
insecureRegistries: ["dockerhub.k8s.local"]原因:发现并没有用http端口部署
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21[root@node3 ~]# ss -tlnp | grep -E '80|443'
LISTEN 0 32768 *:443 *:* users:(("registry",pid=5664,fd=3))
[root@node3 ~]# ^C
[root@node3 ~]# ps aux | grep registry
root 5664 0.0 0.1 121796 25028 ? Ssl 21:29 0:00 /usr/local/bin/registry serve /etc/kubekey/registry/config.yaml
root 6251 0.0 0.0 6116 1920 pts/0 S+ 21:34 0:00 grep --color=auto registry
[root@node3 ~]# cat /etc/kubekey/registry/config.yaml
version: 0.1
log:
fields:
service: registry
storage:
cache:
layerinfo: inmemory
filesystem:
rootdirectory: /mnt/registry
http:
addr: :443
tls:
certificate: /etc/ssl/registry/ssl/http:.pem
key: /etc/ssl/registry/ssl/http:-key.pem解决:改回
https
部署运行kk离线安装命令时,提示如下错误
1
FATA[0000] pulling image: rpc error: code = NotFound desc = failed to pull and unpack image "dockerhub.k8s.local/kubesphere/pause:3.9": failed to resolve reference "dockerhub.k8s.local/kubesphere/pause:3.9": dockerhub.k8s.local/kubesphere/pause:3.9: not found: Process exited with status 1
原因:
dockerhub.k8s.local/kubesphere/pause:3.9
找不到,因为离线镜像时,里面存的实际镜像名是dockerhub.k8s.local/kubesphereio/pause:3.9
,多了个io
解决:修改
config-sample.yaml
,然后重新执行kk离线安装命令1
2
3spec:
registry:
namespaceOverride: kubesphereio #添加名称覆写修改
manifest-sample.yaml
文件,在spec:
下添加如下1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22spec:
preInstall:
- name: install-local-rpms
commands:
- echo "===> 解压本地 RPM 离线包..."
- mkdir -p /opt/kk-rpms
- if [ ! -d /opt/kk-rpms/repodata ]; then
echo "===> 第一次解压 kk-rpms.tar.gz";
tar -xzf ./kk-rpms.tar.gz -C /opt/kk-rpms;
else
echo "===> RPM 仓库已存在,跳过解压";
fi
- echo "===> 配置本地 yum 仓库..."
- if ! dnf repolist | grep -q "kk-local"; then
echo -e "[kk-local]\nname=KK Offline Repo\nbaseurl=file:///opt/kk-rpms\nenabled=1\ngpgcheck=0" > /etc/yum.repos.d/kk-local.repo;
else
echo "===> KK 本地仓库已配置,跳过";
fi
- echo "===> 安装 conntrack socat tar(若未安装)..."
- dnf install -y conntrack socat tar || echo "===> 忽略已安装组件"原因:经测试
preInstall
没用,根本不会执行解决:可能需要自己写脚本进行手动操作安装,用
sehll
会手动输入密码,及同意host key,因此可以采用go
写脚本,直接编译可以执行文件,达到一键部署。运行kk安装时报错
1
2
3
4
5
6
7
8
9#kk安装错误信息
failed: [node3] [RestartETCD] exec failed after 3 retries: start etcd failed: Failed to exec command: sudo -E /bin/bash -c "systemctl daemon-reload && systemctl restart etcd && systemctl enable etcd"
Job for etcd.service failed because a timeout was exceeded.
See "systemctl status etcd.service" and "journalctl -xeu etcd.service" for details.: Process exited with status 1
#查看etcd详细信息的错误
[root@192-168-10-30 ~]# journalctl -xeu etcd | tail -50
Jul 04 10:34:38 node1 etcd[41618]: {"level":"warn","ts":"2025-07-04T10:34:38.880411+0800","caller":"embed/config_logging.go:169","msg":"rejected connection","remote-addr":"192.168.10.31:41466","server-name":"","error":"remote error: tls: bad certificate"}
Jul 04 10:34:38 node1 etcd[41618]: {"level":"warn","ts":"2025-07-04T10:34:38.886267+0800","caller":"rafthttp/probing_status.go:68","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_RAFT_MESSAGE","remote-peer-id":"6808450bb7874830","rtt":"0s","error":"tls: failed to verify certificate: x509: certificate is valid for 127.0.0.1, ::1, 192.168.10.219, 192.168.10.226, 192.168.10.213, not 192.168.10.31"}
Jul 04 10:34:38 node1 etcd[41618]: {"level":"warn","ts":"2025-07-04T10:34:38.915113+0800","caller":"embed/config_logging.go:169","msg":"rejected connection","remote-addr":"192.168.10.32:51858","server-name":"","error":"remote error: tls: bad certificate"}原因:kk曾经安装过其他集群,加
--debug
,可以看到,他是从本地复制的,本地还是之前集群的安装证书信息1
2
3
4
5
6[root@192-168-10-87 ~]# ./kk create cluster -f config-sample.yaml --debug
22:42:41 EDT scp local file /root/kubekey/pki/etcd/node-node1-key.pem to remote /tmp/kubekey/etc/ssl/etcd/ssl/node-node1-key.pem success
[root@192-168-10-87 ~]# ll /root/kubekey/pki/etcd/
total 80
-rw-------. 1 root root 1679 Jul 3 22:42 admin-node1-key.pem
-rw-r--r--. 1 root root 1375 Jul 3 22:42 admin-node1.pem解决:清除和kk脚本的同级目录
kubekey