错误记录
pod 无法正常创建
[root@master-62 ~]# kubectl get po --all-namespaces -owide |grep -v "Running"
liangmingb test-1858107638-nrw6d 0/1 ContainerCreating 0 1h <none> slave-143
[root@master-62 ~]# kubectl describe po test-1858107638-nrw6d -nliangmingb
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
1h 5s 1762 kubelet, slave-143 Warning FailedSync Error syncing pod
1h 4s 1762 kubelet, slave-143 Normal SandboxChanged Pod sandbox changed, it will be killed and re-created
查看kubelet日志 有报错如下:
12月 11 20:07:09 slave-143 kubelet[23490]: WARNING:1211 20:07:09.376601 23490 cni.go:258] CNI failed to retrieve network namespace path: Error: No such container: 9adaef08d85b827e78600cc0a170df27617481442463962f3f30495ce878cc1f
12月 11 20:07:09 slave-143 kubelet[23490]: ERROR:1211 20:07:09.622003 23490 docker_sandbox.go:239] Failed to stop sandbox "9adaef08d85b827e78600cc0a170df27617481442463962f3f30495ce878cc1f": Error response from daemon: {"message":"No such container: 9adaef08d85b827e78600cc0a170df27617481442463962f3f30495ce878cc1f"}
12月 11 20:07:09 slave-143 kubelet[23490]: ERROR
12月 11 20:07:09 slave-143 kubelet[23490]: ERROR
12月 11 20:07:09 slave-143 kubelet[23490]: ERROR
12月 11 20:07:09 slave-143 kubelet[23490]: ERROR:1211 20:07:09.878784 23490 remote_runtime.go:91] RunPodSandbox from runtime service failed: rpc error: code = 2 desc = failed to start sandbox container for pod "test-1858107638-nrw6d": Error response from daemon: {"message":"grpc: the connection is unavailable"}
这种报错产生的影响
# 如果kubelet有这种报错 会有如下影响
1. node所在节点现有的pod 无法停止 也无法删除
注: 但不影响已有pod的使用 从其他节点可以ping通已有pod的ip地址
2. node上新分配过来的pod 也无法正常创建
[root@slave-143 ~]# docker ps |grep filebeat
07646329caa3 reg.enncloud.cn/enncloud/filebeat@sha256:8869c3fcd0eadfe6202407b06eec8e672f37de3d093031bc01c03e5736e842d9 "./run.sh" 34 hours ago Up 34 hours k8s_filebeat_filebeat-hvrz9_kube-system_79e95933-fc20-11e8-885a-5254c2cdf2fd_0
9ab9c478ea3d reg.enncloud.cn/enncloud/pause-amd64:3.0 "/pause" 34 hours ago Up 34 hours k8s_POD_filebeat-hvrz9_kube-system_79e95933-fc20-11e8-885a-5254c2cdf2fd_0
[root@slave-143 ~]# docker stop 07646329caa3
Error response from daemon: Cannot stop container 07646329caa3: Cannot kill container 07646329caa3e77b64f76707df5f69242f753c678a9c344dc83ff25cdd0cdb2f: rpc error: code = 14 desc = grpc: the connection is unavailable
[root@slave-143 ~]# docker rm -f 07646329caa3
Error response from daemon: Could not kill running container 07646329caa3e77b64f76707df5f69242f753c678a9c344dc83ff25cdd0cdb2f, cannot remove - Cannot kill container 07646329caa3e77b64f76707df5f69242f753c678a9c344dc83ff25cdd0cdb2f: rpc error: code = 14 desc = grpc: the connection is unavailable
# 换言之 kubectl drain 这种驱逐pod的方式 也无法生效 因为pod 根本无法删除
解决办法
systemctl restart docker
github 上的相关issue
https://www.infoq.cn/article/2017%2F02%2FDocker-Containerd-RunC