docker – 在kubernetes中调试DNS解析

我使用以下命令在Ubuntu 16.04上初始化了kubernetes v1.13.1集群:

sudo kubeadm init --token-ttl=0 --apiserver-advertise-address=192.168.88.142

并使用以下方式安装编织:

kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"

我有10个覆盆子pi充当工作节点并连接到群集.所有这些都正在运行部署.节点正在运行pod,尝试连接到iot hub visdwk-azure-devices.net并发布一些数据.在10个节点中,只有少数节点能够连接,而其他投掷错误无法连接到iot集线器.我做了一个ping测试,发现他们在ping谷歌的公共IP地址时无法ping谷歌.

这让我觉得coredns pod出了问题.我按照这个documentation进行了测试.

Pod在/etc/resolv.conf中具有以下内容

nameserver 10.96.0.10
search visdwk.svc.cluster.local svc.cluster.local cluster.local
options ndots:5

这看起来很正常.所有的coredns pod都运行良好.

coredns-86c58d9df4-42xqc               1/1     Running   8         1d11h
coredns-86c58d9df4-p6d98               1/1     Running   7         1d6h

我还从busybox容器中完成了nslookup kubernetes.default并获得了正确的响应.以下是coredns-86c58d9df4-42xqc的日志

.:53
2019-02-08T08:40:10.038Z [INFO] CoreDNS-1.2.6
2019-02-08T08:40:10.039Z [INFO] linux/amd64, go1.11.2, 756749c
CoreDNS-1.2.6
linux/amd64, go1.11.2, 756749c
 [INFO] plugin/reload: Running configuration MD5 = 
f65c4821c8a9b7b5eb30fa4fbc167769
t

以上日志也看起来很正常.

我也不能说pod因为编织时出现任何错误而无法解析iot hub,因为如果weave抛出错误,我相信pod将永远不会启动并且总是处于失败状态但实际上pod仍然存在运行状态.如果我错了,请在这里纠正我.

DNS服务似乎也处于运行状态:

NAME                   TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)         AGE
kube-dns               ClusterIP   10.96.0.10     <none>        53/UDP,53/TCP   1d6h

但我仍然无法弄清楚为什么集群中的少数节点无法解析iot集线器.谁能在这里给我一些建议.请帮忙.谢谢.

来自失败的pod的日志:

 1550138544: New connection from 127.0.0.1 on port 1883.
1550138544: New client connected from 127.0.0.1 as 6f1e2c4f-c44d-4c27-b9a9-0fb91f816504 (c1, k60).
1550138544: Sending CONNACK to 6f1e2c4f-c44d-4c27-b9a9-0fb91f816504 (0, 0)
1550138544: Received PUBLISH from 6f1e2c4f-c44d-4c27-b9a9-0fb91f816504 (d0, q0, r0, m0, 'devices/machine6/messages/events/', ... (1211 bytes))
1550138544: Received DISCONNECT from 6f1e2c4f-c44d-4c27-b9a9-0fb91f816504
1550138544: Client 6f1e2c4f-c44d-4c27-b9a9-0fb91f816504 disconnected.
1550138547: Saving in-memory database to /mqtt/data/mosquitto.db.
1550138547: Bridge local.machine6 doing local SUBSCRIBE on topic devices/machine6/messages/events/#
1550138547: Connecting bridge iothub-bridge (visdwk.azure-devices.net:8883)
1550138552: Error creating bridge: Try again.
1550138566: New connection from 127.0.0.1 on port 1883.
1550138566: New client connected from 127.0.0.1 as afb6cc2a-ee78-482e-aff0-fc595e06f86a (c1, k60).
1550138566: Sending CONNACK to afb6cc2a-ee78-482e-aff0-fc595e06f86a (0, 0)
1550138566: Received PUBLISH from afb6cc2a-ee78-482e-aff0-fc595e06f86a (d0, q0, r0, m0, 'devices/machine6/messages/events/', ... (1211 bytes))
1550138566: Received DISCONNECT from afb6cc2a-ee78-482e-aff0-fc595e06f86a
1550138566: Client afb6cc2a-ee78-482e-aff0-fc595e06f86a disconnected.
1550138567: New connection from 127.0.0.1 on port 1883.
1550138567: New client connected from 127.0.0.1 as 01b9e135-fbc8-4d67-9962-356e8cf9f080 (c1, k60).
1550138567: Sending CONNACK to 01b9e135-fbc8-4d67-9962-356e8cf9f080 (0, 0)
1550138567: Received PUBLISH from 01b9e135-fbc8-4d67-9962-356e8cf9f080 (d0, q0, r0, m0, 'devices/machine6/messages/events/', ... (755 bytes))
1550138567: Received DISCONNECT from 01b9e135-fbc8-4d67-9962-356e8cf9f080
1550138567: Client 01b9e135-fbc8-4d67-9962-356e8cf9f080 disconnected.
1550138578: Saving in-memory database to /mqtt/data/mosquitto.db.
1550138583: Bridge local.machine6 doing local SUBSCRIBE on topic devices/machine6/messages/events/#
1550138583: Connecting bridge iothub-bridge (visdwk.azure-devices.net:8883)
1550138588: Error creating bridge: Try again.

Pod正在运行一个mosquitto容器,试图连接到visdwk.azure-devices.net并抛出错误.

Connecting bridge iothub-bridge (visdwk.azure-devices.net:8883)
Error creating bridge: Try again.

最佳答案 看来你的一个DNS Pod没有提供DNS服务.

证据是在“只有少数节点能够连接而其他投掷错误无法连接到iot集线器”的声明中

这是循环中失败节点的负载平衡的典型症状.

尝试:

>删除提供消息的DNS服务器窗格:visdwk.azure-devices.net.visdwknamespace.svc.cluster.local. udp 82 false 512“NXDOMAIN qr,aa,rd,ra 175 0.000651078s where visdwk.azure-devices.net
>等待更改通过群集传播.
>测试连接.

如果这是正确的,他们都应该连接.

要确认,请重新添加窗格并删除另一个窗格.重新测试,他们都应该无法连接.

点赞