阿里云kk一键搭建k8s
kk多节点安装
准备
准备一个弹性伸缩,用于管理ECS虚拟机
伸缩组实例配置好ecs的配置(如果是学习可以采用抢占式虚拟机节约成本)
Eg: ecs.hfc6.large(ecs.c7a.largeamd也可以了)抢占式2vCPU+4GiB+centos7.9 64为,可以挂载同一个共享数据盘,用于存储配置数据
配置证书,采用证书
cer
登录端口要求:安全组开放(因为用的一个安全组,所以组内连通策略:组内互通)因此不需要设置服务 端口 ssh 22 TCP etcd 2379-2380 TCP apiserver 6443 TCP calico 9099-9100 TCP bgp 179 TCP nodeport 30000-32767 TCP master 10250-10258 TCP dns 53 TCP/UDP local-registry(离线环境需要) 5000 TCP local-apt(离线环境需要) 5080 TCP rpcbind( 使用 NFS 时需要) 111 TCP ipip(Calico 需要使用 IPIP 协议) IPENCAP / IPIP metrics-server 8443 TCP
只安装kubernetes
1 | yum update -y |
安装kubernetes+kubesphere
1 | yum update -y |
添加/删除节点
1 | #没有之前的部署文件,通过下面命令生产部署文件,文件名sample.yaml,我修改了名字为add-node.yaml |
添加/删除污点
1 | #给节点 node1 增加一个污点,它的键名是 key1,键值是 value1,效果是 NoSchedule。 这表示只有拥有和这个污点相匹配的容忍度的 Pod 才能够被分配到 node1 这个节点。 |
注意事项
- 能修改deployment优先修改deployment,如果不能修改再修改pod的配置文件,因为pod修改之后会重启消失。
常见问题
kk安装多节点集群的时候,报如下错误:
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s [kubelet-check] Initial timeout of 40s passed.
解决:amd的主机有问题,换了一个inter芯片的主机就OK了,新解决方法安装前,执行
yum update -y
一个节点宕机后,添加一个新的节点替代,执行kk添加节点命令时报如下错误:
etcd health check failed: Failed to exec command: sudo -E /bin/bash -c "export ETCDCTL_API=2;export ETCDCTL_CERT_FILE='/etc/ssl/etcd/ssl/admin-node1.pem';export ETCDCTL_KEY_FILE='/etc/ssl/etcd/ssl/admin-node1-key.pem';export ETCDCTL_CA_FILE='/etc/ssl/etcd/ssl/ca.pem';/usr/local/bin/etcdctl --endpoints=https://192.168.14.16:2379,https://192.168.14.17:2379,https://192.168.14.20:2379 cluster-health | grep -q 'cluster is healthy'
解决:先删除etcd的异常节点,在重新执行kk添加etcd及master节点
执行
kubectl drain node3 --force --ignore-daemonsets --delete-emptydir-data
删除节点时报如下错误:I0705 16:41:20.004286 18301 request.go:665] Waited for 1.14877279s due to client-side throttling, not priority and fairness, request: GET:https://lb.kubesphere.local:6443/api/v1/namespaces/kubesphere-monitoring-system/pods/alertmanager-main-1
解决:强制取消,执行
kubectl delete node node3
即可重建节点之后,新加的节点无法调度,报如下错误:
0/3 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 2 Insufficient cpu.
解决:在集群节点查看污点,然后执行
kubectl taint nodes node4 node-role.kubernetes.io/master=:NoSchedule-
删除污点重建节点之后组件监控
prometheus-k8s
容器事件提示如下错误:MountVolume.NewMounter initialization failed for volume "pvc-60891ee0-ba6c-4df4-b381-6e542b27d3a7" : path "/var/openebs/local/pvc-60891ee0-ba6c-4df4-b381-6e542b27d3a7" does not exist
解决:在master节点执行,以下方法并不能解决,待验证存储卷是否是分布式的?1
2
3
4
5
6
7
8
9#在/etc/kubernetes/manifests/kube-apiserver.yaml
#spec:
# containers:
# - command:
# - kube-apiserver
# - -–feature-gates=RemoveSelfLink=fals #添加该行
vim /etc/kubernetes/manifests/kube-apiserver.yaml
#应用配置
kubectl apply -f /etc/kubernetes/manifests/kube-apiserver.yaml使用amd主机安装kubesphere,一直卡在
Please wait for the installation to complete:
,查看pod日志,发现calico-node-4hgbb
的pod提示如下错误:1
2
3
4Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 3m18s (x440 over 66m) kubelet (combined from similar events): Readiness probe failed: 2022-07-06 02:39:53.164 [INFO][4974] confd/health.go 180: Number of node(s) with BGP peering established = 2
calico/node is not ready: felix is not ready: readiness probe reporting 503参考: kubernetes v1.24.0 install failed with calico node not ready #1282
解决:resolved by change calico version, maybe calico verison should update from v3.20.0 to v3.23.0
1
2
3
4
5
6
7
8
9#删除calico相关pod
kubectl -n kube-system get all |grep calico | awk '{print $1}' | xargs kubectl -n kube-system delete
#获取3.23新版本
wget https://docs.projectcalico.org/archive/v3.23/manifests/calico.yaml
#重新安装calico
kubectl apply -f calico.yaml
#calico虽然正常了,但是后续重新用kk安装又回回到不正常状态,注意不要修改pod的配置文件,要修改deployment。
#最终解决在安装集群前执行
yum update -y系统组件
->监控
->prometheus-k8s
->事件
->错误日志:0/3 nodes are available: 3 Insufficient cpu.解决:修改
工作负载
->有状态副本集
->prometheus-k8s
总结:
requests.cpu
设置为0.5代表一个cpu的一半,0.5等价于500m,读做”500 millicpu”(五百毫核)官方说明:Kubernetes 中的资源单位
1
2
3
4
5
6
7
8
9
10#重启之后需要重新修改
containers:
- name: prometheus
resources:
limits:
cpu: '4'
memory: 16Gi
requests:
cpu: 200m #修改为20m
memory: 400Mi #修改为40Mi执行
kubectl top node
提示error: Metrics API not available
错误解决:1.未安装修改
kubesphere
部署配置文件,已安装登录kubesphere点击定制资源定义
->ClusterConfiguration
->ks-installer
修改。1
2metrics_server:
enabled: false #设置为truecalico/node is not ready: felix is not ready: readiness probe reporting 503
再次尝试之后
calico/node is not ready: felix is not ready: Get "http://localhost:9099/readiness": dial tcp [::1]:9099: connect: connection refused
附录
config-kubernetes-1.23.7.yaml
部署文件
1 | apiVersion: kubekey.kubesphere.io/v1alpha2 |
config-kubesphere3.3.0-kubernetes1.23.7.yaml
部署文件
1 | apiVersion: kubekey.kubesphere.io/v1alpha2 |
add-node.yaml
1 | apiVersion: kubekey.kubesphere.io/v1alpha2 |