情况是一台 Master 节点出现了问题,重装系统了,然后再安装 k8s,死活 join 不了,卡在
[check-etcd] Checking that the etcd cluster is healthy
处。
如果是一台完全新的服务器不会出现这个问题。
问题分析
服务器环境如下:
节点 | IP |
---|---|
Master-1 | 192.168.1.12 |
Master-2(被重装系统的服务器) | 192.168.1.13 |
Master-3 | 192.168.1.14 |
经过一番排查,发现是在加入其他 Master 节点时,现在的 Etcd 集群认为有三个 etcd
服务,分别是:
- https://192.168.1.12:2379
- https://192.168.1.13:2379(这个被重做系统了,但是并没有走正常的移除逻辑,所以 etcd 集群认为还存在。)
- https://192.168.1.14:2379
但是真实情况却是系统都重做了,所以就会一直卡在这里连接不通。所以只需要把 https://192.168.1.13:2379
服务从集群中移除即可。
解决问题
- 首先在还正常的 master 节点上找到 etcd 的 Pod,任一进入一个
$ kubectl get pod -A | grep etcd
kube-system etcd-k8s-m1 1/1 Running 0 55d
kube-system etcd-k8s-m3 1/1 Running 0 55d
$ kubectl exec -it etcd-k8s-m1 sh -n kube-system
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl kubectl exec [POD] -- [COMMAND] instead.
#
- 移除有问题的
192.168.1.13:2379
节点
这里的
endpoints
指定的是宿主机的 IP。
出于方便,也可以设置别名:alias etcdctl='etcdctl --endpoints=https://192.168.1.12:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key
'
# etcdctl --endpoints=https://192.168.1.12:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key member list
af634fcc13032524, started, k8s-m3, https://192.168.1.14:2380, https://192.168.1.14:2379, false
dc3a12f61883fabb, started, k8s-m2, https://192.168.1.13:2380, https://192.168.1.13:2379, false
fbdbdeb3498ee39d, started, k8s-m1, https://192.168.1.12:2380, https://192.168.1.12:2379, false
# etcdctl --endpoints=https://192.168.1.12:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key member remove dc3a12f61883fabb
Member dc3a12f61883fabb removed from cluster f94ea6bfe4b4166c
# etcdctl --endpoints=https://192.168.1.12:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key member list
af634fcc13032524, started, k8s-m3, https://192.168.1.14:2380, https://192.168.1.14:2379, false
fbdbdeb3498ee39d, started, k8s-m1, https://192.168.1.12:2380, https://192.168.1.12:2379, false
之后再在 Master-2
上执行 join 就没问题了。
文章评论