BareMetalのkubernetesクラスタでkube-dnsを動かす!
2019/02/09

https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/ ただなぜか名前解決ができたりできなかったり。。。
原因と解決法がわかったのでまとめました!
ゴール
kubernetesのPodでnslookupコマンドを実行し、service名でAレコードが返ってくること前提
kubernetesのクラスタを物理マシンに構築済み構築していない方はこちらを参考に
ansibleでkubernetes環境の構築 1
ansibleでkubernetes環境の構築 2
説明しないこと
kubernetesの使い方kubernetesの基本的なサービスの説明(Podとはなんぞや。みたいな)
環境
物理マシンUbuntu18.04
kubernetes 1.10
事象
kube-dnsのPodは動いている。
1 2 |
<span class="nv">$ </span>kubectl get pod | <span class="nb">grep </span>dns kube-dns-857cdff9c4-ttlzz 3/3 Running 0 8d |
2週間で500回ぐらいkube-dnsのPodが再起動してる。。なんでや。
調査
ということで調査してみました。どうやらkube-dnsのPodでは以下の3つのコンテナが動いているらしい。
skydnsとdnsmasqとkube-dns
この3つのログを確認していく まずdnsmasqのコンテナ
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
<span class="nv">$ </span>kubectl logs <span class="nt">--namespace</span><span class="o">=</span>kube-system <span class="k">$(</span>kubectl get pods <span class="nt">--namespace</span><span class="o">=</span>kube-system <span class="nt">-l</span> k8s-app<span class="o">=</span>kube-dns <span class="nt">-o</span> name<span class="k">)</span> <span class="nt">-c</span> dnsmasq I0419 17:44:35.785171 1 main.go:76] opts: <span class="o">{{</span>/usr/sbin/dnsmasq <span class="o">[</span><span class="nt">-k</span> <span class="nt">--cache-size</span><span class="o">=</span>1000 <span class="nt">--no-negcache</span> <span class="nt">--log-facility</span><span class="o">=</span>- <span class="nt">--server</span><span class="o">=</span>/cluster.local/127.0.0.1#10053 <span class="nt">--server</span><span class="o">=</span>/in-addr.arpa/127.0.0.1#10053 <span class="nt">--server</span><span class="o">=</span>/ip6.arpa/127.0.0.1#10053] <span class="nb">true</span><span class="o">}</span> /etc/k8s/dns/dnsmasq-nanny 10000000000<span class="o">}</span> I0419 17:44:35.785336 1 nanny.go:94] Starting dnsmasq <span class="o">[</span><span class="nt">-k</span> <span class="nt">--cache-size</span><span class="o">=</span>1000 <span class="nt">--no-negcache</span> <span class="nt">--log-facility</span><span class="o">=</span>- <span class="nt">--server</span><span class="o">=</span>/cluster.local/127.0.0.1#10053 <span class="nt">--server</span><span class="o">=</span>/in-addr.arpa/127.0.0.1#10053 <span class="nt">--server</span><span class="o">=</span>/ip6.arpa/127.0.0.1#10053] I0419 17:44:35.876534 1 nanny.go:119] W0419 17:44:35.876572 1 nanny.go:120] Got EOF from stdout I0419 17:44:35.876578 1 nanny.go:116] dnsmasq[26]: started, version 2.78 cachesize 1000 I0419 17:44:35.876615 1 nanny.go:116] dnsmasq[26]: compile <span class="nb">time </span>options: IPv6 GNU-getopt no-DBus no-i18n no-IDN DHCP DHCPv6 no-Lua TFTP no-conntrack ipset auth no-DNSSEC loop-detect inotify I0419 17:44:35.876632 1 nanny.go:116] dnsmasq[26]: using nameserver 127.0.0.1#10053 <span class="k">for </span>domain ip6.arpa I0419 17:44:35.876642 1 nanny.go:116] dnsmasq[26]: using nameserver 127.0.0.1#10053 <span class="k">for </span>domain <span class="k">in</span><span class="nt">-addr</span>.arpa I0419 17:44:35.876653 1 nanny.go:116] dnsmasq[26]: using nameserver 127.0.0.1#10053 <span class="k">for </span>domain cluster.local I0419 17:44:35.876666 1 nanny.go:116] dnsmasq[26]: reading /etc/resolv.conf I0419 17:44:35.876677 1 nanny.go:116] dnsmasq[26]: using nameserver 127.0.0.1#10053 <span class="k">for </span>domain ip6.arpa I0419 17:44:35.876691 1 nanny.go:116] dnsmasq[26]: using nameserver 127.0.0.1#10053 <span class="k">for </span>domain <span class="k">in</span><span class="nt">-addr</span>.arpa I0419 17:44:35.876701 1 nanny.go:116] dnsmasq[26]: using nameserver 127.0.0.1#10053 <span class="k">for </span>domain cluster.local I0419 17:44:35.876709 1 nanny.go:116] dnsmasq[26]: using nameserver 127.0.0.53#53 I0419 17:44:35.876717 1 nanny.go:116] dnsmasq[26]: <span class="nb">read</span> /etc/hosts - 7 addresses |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
<span class="nv">$ </span>kubectl logs <span class="nt">--namespace</span><span class="o">=</span>kube-system <span class="k">$(</span>kubectl get pods <span class="nt">--namespace</span><span class="o">=</span>kube-system <span class="nt">-l</span> k8s-app<span class="o">=</span>kube-dns <span class="nt">-o</span> name<span class="k">)</span> <span class="nt">-c</span> kubedns I0419 17:40:11.473047 1 dns.go:48] version: 1.14.8 I0419 17:40:11.473975 1 server.go:71] Using configuration <span class="nb">read </span>from directory: /kube-dns-config with period 10s I0419 17:40:11.474024 1 server.go:119] FLAG: <span class="nt">--alsologtostderr</span><span class="o">=</span><span class="s2">"false"</span> I0419 17:40:11.474032 1 server.go:119] FLAG: <span class="nt">--config-dir</span><span class="o">=</span><span class="s2">"/kube-dns-config"</span> I0419 17:40:11.474037 1 server.go:119] FLAG: <span class="nt">--config-map</span><span class="o">=</span><span class="s2">""</span> I0419 17:40:11.474041 1 server.go:119] FLAG: <span class="nt">--config-map-namespace</span><span class="o">=</span><span class="s2">"kube-system"</span> I0419 17:40:11.474044 1 server.go:119] FLAG: <span class="nt">--config-period</span><span class="o">=</span><span class="s2">"10s"</span> I0419 17:40:11.474049 1 server.go:119] FLAG: <span class="nt">--dns-bind-address</span><span class="o">=</span><span class="s2">"0.0.0.0"</span> I0419 17:40:11.474053 1 server.go:119] FLAG: <span class="nt">--dns-port</span><span class="o">=</span><span class="s2">"10053"</span> I0419 17:40:11.474058 1 server.go:119] FLAG: <span class="nt">--domain</span><span class="o">=</span><span class="s2">"cluster.local."</span> I0419 17:40:11.474063 1 server.go:119] FLAG: <span class="nt">--federations</span><span class="o">=</span><span class="s2">""</span> I0419 17:40:11.474067 1 server.go:119] FLAG: <span class="nt">--healthz-port</span><span class="o">=</span><span class="s2">"8081"</span> I0419 17:40:11.474071 1 server.go:119] FLAG: <span class="nt">--initial-sync-timeout</span><span class="o">=</span><span class="s2">"1m0s"</span> I0419 17:40:11.474074 1 server.go:119] FLAG: <span class="nt">--kube-master-url</span><span class="o">=</span><span class="s2">""</span> I0419 17:40:11.474079 1 server.go:119] FLAG: <span class="nt">--kubecfg-file</span><span class="o">=</span><span class="s2">""</span> I0419 17:40:11.474082 1 server.go:119] FLAG: <span class="nt">--log-backtrace-at</span><span class="o">=</span><span class="s2">":0"</span> I0419 17:40:11.474087 1 server.go:119] FLAG: <span class="nt">--log-dir</span><span class="o">=</span><span class="s2">""</span> I0419 17:40:11.474091 1 server.go:119] FLAG: <span class="nt">--log-flush-frequency</span><span class="o">=</span><span class="s2">"5s"</span> I0419 17:40:11.474094 1 server.go:119] FLAG: <span class="nt">--logtostderr</span><span class="o">=</span><span class="s2">"true"</span> I0419 17:40:11.474098 1 server.go:119] FLAG: <span class="nt">--nameservers</span><span class="o">=</span><span class="s2">""</span> I0419 17:40:11.474101 1 server.go:119] FLAG: <span class="nt">--stderrthreshold</span><span class="o">=</span><span class="s2">"2"</span> I0419 17:40:11.474104 1 server.go:119] FLAG: <span class="nt">--v</span><span class="o">=</span><span class="s2">"2"</span> I0419 17:40:11.474107 1 server.go:119] FLAG: <span class="nt">--version</span><span class="o">=</span><span class="s2">"false"</span> I0419 17:40:11.474113 1 server.go:119] FLAG: <span class="nt">--vmodule</span><span class="o">=</span><span class="s2">""</span> I0419 17:40:11.474190 1 server.go:201] Starting SkyDNS server <span class="o">(</span>0.0.0.0:10053<span class="o">)</span> I0419 17:40:11.488125 1 server.go:220] Skydns metrics enabled <span class="o">(</span>/metrics:10055<span class="o">)</span> I0419 17:40:11.488170 1 dns.go:146] Starting endpointsController I0419 17:40:11.488180 1 dns.go:149] Starting serviceController I0419 17:40:11.488348 1 logs.go:41] skydns: ready <span class="k">for </span>queries on cluster.local. <span class="k">for </span>tcp://0.0.0.0:10053 <span class="o">[</span>rcache 0] I0419 17:40:11.488407 1 logs.go:41] skydns: ready <span class="k">for </span>queries on cluster.local. <span class="k">for </span>udp://0.0.0.0:10053 <span class="o">[</span>rcache 0] I0419 17:40:11.988549 1 dns.go:170] Initialized services and endpoints from apiserver I0419 17:40:11.988609 1 server.go:135] Setting up Healthz Handler <span class="o">(</span>/readiness<span class="o">)</span> I0419 17:40:11.988641 1 server.go:140] Setting up cache handler <span class="o">(</span>/cache<span class="o">)</span> I0419 17:40:11.988649 1 server.go:126] Status HTTP port 8081 |
1 2 3 4 5 6 7 8 9 10 11 |
<span class="nv">$ </span>kubectl logs <span class="nt">--namespace</span><span class="o">=</span>kube-system <span class="k">$(</span>kubectl get pods <span class="nt">--namespace</span><span class="o">=</span>kube-system <span class="nt">-l</span> k8s-app<span class="o">=</span>kube-dns <span class="nt">-o</span> name<span class="k">)</span> <span class="nt">-c</span> sidecar ERROR: logging before flag.Parse: I1020 11:43:05.394883 1 main.go:48] Version v1.14.4-2-g5584e04 ERROR: logging before flag.Parse: I1020 11:43:05.394935 1 server.go:45] Starting server <span class="o">(</span>options <span class="o">{</span>DnsMasqPort:53 DnsMasqAddr:127.0.0.1 DnsMasqPollIntervalMs:5000 Probes:[<span class="o">{</span>Label:kubedns Server:127.0.0.1:10053 Name:kubernetes.default.svc.cluster.local. Interval:5s Type:1<span class="o">}</span> <span class="o">{</span>Label:dnsmasq Server:127.0.0.1:53 Name:kubernetes.default.svc.cluster.local. Interval:5s Type:1<span class="o">}]</span> PrometheusAddr:0.0.0.0 PrometheusPort:10054 PrometheusPath:/metrics PrometheusNamespace:kubedns<span class="o">})</span> ERROR: logging before flag.Parse: I1020 11:43:05.394965 1 dnsprobe.go:75] Starting dnsProbe <span class="o">{</span>Label:kubedns Server:127.0.0.1:10053 Name:kubernetes.default.svc.cluster.local. Interval:5s Type:1<span class="o">}</span> ERROR: logging before flag.Parse: I1020 11:43:05.394995 1 dnsprobe.go:75] Starting dnsProbe <span class="o">{</span>Label:dnsmasq Server:127.0.0.1:53 Name:kubernetes.default.svc.cluster.local. Interval:5s Type:1<span class="o">}</span> ERROR: logging before flag.Parse: W1020 11:43:22.399309 1 server.go:64] Error getting metrics from dnsmasq: <span class="nb">read </span>udp 127.0.0.1:50271->127.0.0.1:53: i/o <span class="nb">timeout </span>ERROR: logging before flag.Parse: W1020 11:43:29.399631 1 server.go:64] Error getting metrics from dnsmasq: <span class="nb">read </span>udp 127.0.0.1:50260->127.0.0.1:53: i/o <span class="nb">timeout </span>ERROR: logging before flag.Parse: W1020 11:43:36.399957 1 server.go:64] Error getting metrics from dnsmasq: <span class="nb">read </span>udp 127.0.0.1:43683->127.0.0.1:53: i/o <span class="nb">timeout </span>ERROR: logging before flag.Parse: W1020 11:43:43.400251 1 server.go:64] Error getting metrics from dnsmasq: <span class="nb">read </span>udp 127.0.0.1:38300->127.0.0.1:53: i/o <span class="nb">timeout </span>ERROR: logging before flag.Parse: W1020 11:43:50.400500 1 server.go:64] Error getting metrics from dnsmasq: <span class="nb">read </span>udp 127.0.0.1:45071->127.0.0.1:53: i/o <span class="nb">timeout </span>ERROR: logging before flag.Parse: W1020 11:44:04.187002 1 server.go:64] Error getting metrics from dnsmasq: <span class="nb">read </span>udp 127.0.0.1:60537->127.0.0.1:53: i/o <span class="nb">timeout</span> |
解決策
先人の知恵をお借りすべく、githubのissueを漁りまくる。どんぴしゃのissueを発見した。
DNS not working
issueにあげられている環境はUbuntu17だが、ほぼ同じ現象なので試してみる。 systemd-resolvedサービスを停止して、自動起動しないように設定。
1 2 |
<span class="nv">$ </span><span class="nb">sudo </span>systemctl stop systemd-resolved <span class="nv">$ </span><span class="nb">sudo </span>systemctl disable systemd-resolved |
編集前
1 2 3 4 5 6 |
<span class="c"># Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)</span> <span class="c"># DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN</span> <span class="c"># 127.0.0.53 is the systemd-resolved stub resolver.</span> <span class="c"># run "systemd-resolve --status" to see details about the actual nameservers.</span> nameserver 127.0.0.53 |
編集後
1 2 3 4 5 6 |
<span class="c"># Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)</span> <span class="c"># DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN</span> <span class="c"># 127.0.0.53 is the systemd-resolved stub resolver.</span> <span class="c"># run "systemd-resolve --status" to see details about the actual nameservers.</span> nameserver 8.8.8.8 |
考察
考察にもなってないけど。。。system-resolvedについてちょっと調べてみた。
https://kledgeb.blogspot.com/2016/06/ubuntu-1610-7-dnssystemd-resolved.html
system-resolvedはただのローカルリゾルバーとして機能しているのみで特に問題になるような動作はしてなさそう。 issueをもう少し探すと、、、ただのバグっぽい。
https://github.com/kubernetes/kubeadm/issues/273 解決法としては今回書いたsystemd-resolveの停止の他に下記もあるようです。
kubeletの起動オプションにresolve.confファイルを追記することで解決できるみたいです。
https://github.com/kubernetes/kubernetes/issues/45828
Docker,Kubernetesについてもっと知りたい方はココナラからご連絡ください!