2020年8月

Kubernetes 1.18.x版本部署metrics-server组件时,采集不到数据

表现为:

  • kubectl top nodeskubectl top nodes报错信息如下

    Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)
    Error from server (ServiceUnavailable): the server is currently unable to handle the request (get pods.metrics.k8s.io)

查看kube-apiserver的日志信息
kubectl -n kube-system logs -f kube-apiserver-master-1 --tail 10

E0826 04:25:10.111976       1 available_controller.go:420] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.108.243.54:443/apis/metrics.k8s.io/v1beta1: Get https://10.108.243.54:443/apis/metrics.k8s.io/v1beta1: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
E0826 04:25:15.112635       1 available_controller.go:420] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.108.243.54:443/apis/metrics.k8s.io/v1beta1: Get https://10.108.243.54:443/apis/metrics.k8s.io/v1beta1: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
W0826 04:25:37.762783       1 handler_proxy.go:102] no RequestInfo found in the context
E0826 04:25:37.762890       1 controller.go:114] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: service unavailable
, Header: map[Content-Type:[text/plain; charset=utf-8] X-Content-Type-Options:[nosniff]]
I0826 04:25:37.762915       1 controller.go:127] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue.
E0826 04:25:41.763211       1 available_controller.go:420] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.108.243.54:443/apis/metrics.k8s.io/v1beta1: Get https://10.108.243.54:443/apis/metrics.k8s.io/v1beta1: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
E0826 04:25:46.764318       1 available_controller.go:420] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.108.243.54:443/apis/metrics.k8s.io/v1beta1: Get https://10.108.243.54:443/apis/metrics.k8s.io/v1beta1: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
E0826 04:26:11.763745       1 available_controller.go:420] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.108.243.54:443/apis/metrics.k8s.io/v1beta1: Get https://10.108.243.54:443/apis/metrics.k8s.io/v1beta1: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
E0826 04:26:16.764339       1 available_controller.go:420] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.108.243.54:443/apis/metrics.k8s.io/v1beta1: Get https://10.108.243.54:443/apis/metrics.k8s.io/v1beta1: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
E0826 04:26:41.763920       1 available_controller.go:420] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.108.243.54:443/apis/metrics.k8s.io/v1beta1: Get https://10.108.243.54:443/apis/metrics.k8s.io/v1beta1: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
E0826 04:26:46.764574       1 available_controller.go:420] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.108.243.54:443/apis/metrics.k8s.io/v1beta1: Get https://10.108.243.54:443/apis/metrics.k8s.io/v1beta1: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

这段日志看不出什么问题。

最终解决:kube-apiserver添加--enable-aggregator-routing=true启动参数,原因不明,因为在我另一个k8s 1.16.x集群中,并没有添加这项参数,工作正常。有知道原因的伙伴请不吝告知,在此谢过。

PS:网上有文章说如果master节点没有运行kube-proxy进程才需要加上这个启动参数,而我的集群中master节点是有运行kube-proxy的。

参考链接1:https://github.com/kubernetes-sigs/metrics-server/issues/448
参考链接2:https://blog.z0ukun.com/?p=1462