k8s 配置修改错误导致 kubectl 用不了了 - V2EX
V2EX = way to explore
V2EX 是一个关于分享和探索的地方
现在注册
已注册用户请  登录
dunhanson
V2EX    Kubernetes

k8s 配置修改错误导致 kubectl 用不了了

  •  
  •   dunhanson 2021-09-15 15:53:47 +08:00 2878 次点击
    这是一个创建于 1498 天前的主题,其中的信息可能已经有所发展或是发生改变。

    现在一团糟,不知道怎么解决

    kubelet 是报这个错误

    Sep 15 15:51:54 master kubelet[94723]: E0915 15:51:54.473811 94723 kubelet.go:2183] node "master" not found

    我是单节点 master 升级高可用,现在高可用有问题,我就想回到单节点 master

    第 1 条附言    2021-09-15 16:25:38 +08:00
    第 2 条附言    2021-09-15 16:25:49 +08:00

    4ZZUeI.png

    18 条回复    2021-09-16 13:49:22 +08:00
    cyaki
        1
    cyaki  
       2021-09-15 16:15:56 +08:00
    部署方式是什么? kubeadm ? hyperkube ? 还是手动部署的各个组件
    container runtime 是使用的 docker 还是 containerd, cri-o ?

    感觉上像是 kubelet 连不上 apiserver,你可以贴一下 apiserver 的日志

    类似的问题 https://github.com/kubernetes/kubeadm/issues/1153
    dunhanson
        2
    dunhanson  
    OP
       2021-09-15 16:19:00 +08:00
    @cyaki kubeadm
    dunhanson
        3
    dunhanson  
    OP
       2021-09-15 16:19:56 +08:00
    [root@master ~]# kubectl get nodes
    The connection to the server 192.168.2.53:6443 was refused - did you specify the right host or port?
    [root@master ~]# systemctl status kubelet
    ● kubelet.service - kubelet: The Kubernetes Node Agent
    Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
    Drop-In: /usr/lib/systemd/system/kubelet.service.d
    └─10-kubeadm.conf
    Active: active (running) since Thu 2021-09-16 00:04:24 CST; 7h left
    Docs: https://kubernetes.io/docs/
    Main PID: 977 (kubelet)
    Tasks: 18 (limit: 49767)
    Memory: 130.8M
    CGroup: /system.slice/kubelet.service
    └─977 /usr/bin/kubelet --bootstrap-kubecOnfig=/etc/kubernetes/bootstrap-kubelet.conf --kubecOnfig=/etc/kubernetes/kubelet.conf --cOnfig=/var/lib/kubelet/config.yaml --network-plugin=cni --pod-infra-container-image=k8s.gcr.io/pause:3.2

    Sep 15 16:13:29 master kubelet[977]: E0915 16:13:29.846153 977 kubelet.go:2183] node "master" not found
    Sep 15 16:13:29 master kubelet[977]: E0915 16:13:29.946387 977 kubelet.go:2183] node "master" not found
    Sep 15 16:13:30 master kubelet[977]: E0915 16:13:30.046577 977 kubelet.go:2183] node "master" not found
    Sep 15 16:13:30 master kubelet[977]: E0915 16:13:30.146836 977 kubelet.go:2183] node "master" not found
    Sep 15 16:13:30 master kubelet[977]: E0915 16:13:30.246931 977 kubelet.go:2183] node "master" not found
    Sep 15 16:13:30 master kubelet[977]: E0915 16:13:30.347092 977 kubelet.go:2183] node "master" not found
    Sep 15 16:13:30 master kubelet[977]: E0915 16:13:30.447178 977 kubelet.go:2183] node "master" not found
    Sep 15 16:13:30 master kubelet[977]: I0915 16:13:30.462614 977 kubelet_node_status.go:70] Attempting to register node master
    Sep 15 16:13:30 master kubelet[977]: E0915 16:13:30.547409 977 kubelet.go:2183] node "master" not found
    Sep 15 16:13:30 master kubelet[977]: E0915 16:13:30.647751 977 kubelet.go:2183] node "master" not found
    [root@master ~]#
    dunhanson     4
    dunhanson  
    OP
       2021-09-15 16:20:31 +08:00
    @cyaki 用的 docker
    dunhanson
        5
    dunhanson  
    OP
       2021-09-15 16:25:14 +08:00
    cyaki
        6
    cyaki  
       2021-09-15 16:29:19 +08:00
    kubectl 用不了也是被拒绝连接吗 ?
    如果 apiserver 是跑着的,证书没问题,那么 kubectl 是可以连接上去的
    你测试下证书,或则查看下 apiserver 下有没有证书错误之类的日志
    http --verify pki/ca.pem --cert pki/cert.pem --cert-key pki/cert-key.pem https://192.168.2.53:6443/version

    https://192.168.2.53:6443/version 这个地址需要是 apiserver tls 证书中包含的地址

    By the way
    看上面的日志,发现 apiserver 已经挂掉了,要找到 apiserver 挂掉的原因,即使 kubelet 连不上 apiserver, kubelet 也应该可以把 apiserver 跑起来( kubeadm 是用 kubelet 将 apiserver 跑在 docker 里的)
    dunhanson
        7
    dunhanson  
    OP
       2021-09-15 16:32:33 +08:00
    @cyaki apiserver 挂了
    dunhanson
        8
    dunhanson  
    OP
       2021-09-15 16:32:56 +08:00
    我找找看下具体 apiserver 挂的问题
    zanelee
        9
    zanelee  
       2021-09-15 17:17:16 +08:00
    图上看是 apiserver 挂了,容器退出了,启动 apiserver 的时候肯定有报错了,要看报错信息了。
    zen9073
        10
    zen9073  
       2021-09-15 17:56:31 +08:00
    etcd 可以当节点运行,
    但是扩容后,无法再缩减到单节点,
    flybluewolf
        11
    flybluewolf  
       2021-09-15 18:12:10 +08:00
    k8s HA 用 kubeadm 部署后,每个 master 需要手工调整编辑 /etc/kubernetes/manifests 下的 kube-controller-manager.yaml, kube-scheduler.yaml 文件
    #删除
    --port=0 关闭监听非安全端口( http )
    #修改
    --bind-address=0.0.0.0

    修改 etcd.ymal
    --listen-metrics-urls=http://0.0.0.0:2381
    dunhanson
        12
    dunhanson  
    OP
       2021-09-15 18:14:33 +08:00
    @zen9073 这个是错误
    opBackOff: "back-off 5m0s restarting failed cOntainer=kube-apiserver pod=kube-apiserver-master_kube-system(2521a1e32c7f366d38f88fe227ff6710)"
    Sep 15 18:13:43 master kubelet[940]: E0915 18:13:43.018221 940 pod_workers.go:191] Error syncing pod 2521a1e32c7f366d38f88fe227ff6710 ("kube-apiserver-master_kube-system(2521a1e32c7f366d38f88fe227ff6710)"), skipping: failed to "StartContainer" for "kube-apiserver" with CrashLoopBackOff: "back-off 5m0s restarting failed cOntainer=kube-apiserver pod=kube-apiserver-master_kube-system(2521a1e32c7f366d38f88fe227ff6710)"
    Sep 15 18:13:57 master kubelet[940]: E0915 18:13:57.019782 940 pod_workers.go:191] Error syncing pod 2521a1e32c7f366d38f88fe227ff6710 ("kube-apiserver-master_kube-system(2521a1e32c7f366d38f88fe227ff6710)"), skipping: failed to "StartContainer" for "kube-apiserver" with CrashLoopBackOff: "back-off 5m0s restarting failed cOntainer=kube-apiserver pod=kube-apiserver-master_kube-system(2521a1e32c7f366d38f88fe227ff6710)"
    dolphintwo
        13
    dolphintwo  
       2021-09-15 18:28:48 +08:00
    得要更详细的 apiserver 日志
    qinxi
        14
    qinxi  
       2021-09-16 10:12:07 +08:00
    开 ssh,让 @defunct9 上去看看
    zen9073
        15
    zen9073  
       2021-09-16 11:31:59 +08:00
    @dunhanson
    我只前遇到过你这样的问题,
    kube-apiserver 起不来是因为 etcd 没起来,
    etcd 没起来就是我之前说的原因,
    现在恢复到原来多节点 master 配置,
    备份 etcd,再重新用备份的 etcd 部署单节点 k8s,
    defunct9
        16
    defunct9  
       2021-09-16 11:37:35 +08:00
    好像听到谁在叫我
    dunhanson
        17
    dunhanson  
    OP
       2021-09-16 13:49:05 +08:00
    @dolphintwo @qinxi @zen9073 @defunct9 我后面直接重装了,搞到凌晨四点多
    dunhanson
        18
    dunhanson  
    OP
       2021-09-16 13:49:22 +08:00
    唉 下次还是小心点
    关于     帮助文档     自助推广系统     博客     API     FAQ     Solana     5214 人在线   最高记录 6679       Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 25ms UTC 07:15 PVG 15:15 LAX 00:15 JFK 03:15
    Do have faith in what you're doing.
    ubao msn snddm index pchome yahoo rakuten mypaper meadowduck bidyahoo youbao zxmzxm asda bnvcg cvbfg dfscv mmhjk xxddc yybgb zznbn ccubao uaitu acv GXCV ET GDG YH FG BCVB FJFH CBRE CBC GDG ET54 WRWR RWER WREW WRWER RWER SDG EW SF DSFSF fbbs ubao fhd dfg ewr dg df ewwr ewwr et ruyut utut dfg fgd gdfgt etg dfgt dfgd ert4 gd fgg wr 235 wer3 we vsdf sdf gdf ert xcv sdf rwer hfd dfg cvb rwf afb dfh jgh bmn lgh rty gfds cxv xcv xcs vdas fdf fgd cv sdf tert sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf shasha9178 shasha9178 shasha9178 shasha9178 shasha9178 liflif2 liflif2 liflif2 liflif2 liflif2 liblib3 liblib3 liblib3 liblib3 liblib3 zhazha444 zhazha444 zhazha444 zhazha444 zhazha444 dende5 dende denden denden2 denden21 fenfen9 fenf619 fen619 fenfe9 fe619 sdf sdf sdf sdf sdf zhazh90 zhazh0 zhaa50 zha90 zh590 zho zhoz zhozh zhozho zhozho2 lislis lls95 lili95 lils5 liss9 sdf0ty987 sdft876 sdft9876 sdf09876 sd0t9876 sdf0ty98 sdf0976 sdf0ty986 sdf0ty96 sdf0t76 sdf0876 df0ty98 sf0t876 sd0ty76 sdy76 sdf76 sdf0t76 sdf0ty9 sdf0ty98 sdf0ty987 sdf0ty98 sdf6676 sdf876 sd876 sd876 sdf6 sdf6 sdf9876 sdf0t sdf06 sdf0ty9776 sdf0ty9776 sdf0ty76 sdf8876 sdf0t sd6 sdf06 s688876 sd688 sdf86