Konubinix' opinionated web of thoughts

What Is V1beta1.metrics.k8s.io and What Does False (MissingEndpoints) Means

Fleeting

what is v1beta1.metrics.k8s.io and what does False (MissingEndpoints) means kubernetes

ktl create namespace testns
namespace/testns created
ktl delete namespaces testns
namespace "testns" deleted

but hanging…

ktl describe namespace testns
Name:         testns
Labels:       kubernetes.io/metadata.name=testns
Annotations:  <none>
Status:       Terminating
Conditions:
  Type                                         Status  LastTransitionTime               Reason                  Message
  ----                                         ------  ------------------               ------                  -------
  NamespaceDeletionDiscoveryFailure            True    Mon, 14 Nov 2022 14:39:09 +0100  DiscoveryFailed         Discovery failed for some groups, 1 failing: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
  NamespaceDeletionGroupVersionParsingFailure  False   Mon, 14 Nov 2022 14:39:09 +0100  ParsedGroupVersions     All legacy kube types successfully parsed
  NamespaceDeletionContentFailure              False   Mon, 14 Nov 2022 14:39:09 +0100  ContentDeleted          All content successfully deleted, may be waiting on finalization
  NamespaceContentRemaining                    False   Mon, 14 Nov 2022 14:39:09 +0100  ContentRemoved          All content successfully removed
  NamespaceFinalizersRemaining                 False   Mon, 14 Nov 2022 14:39:09 +0100  ContentHasNoFinalizers  All content-preserving finalizers finished

No resource quota.

No LimitRange resource.
ktl get apiservices.apiregistration.k8s.io v1beta1.metrics.k8s.io
NAME                     SERVICE                      AVAILABLE                  AGE
v1beta1.metrics.k8s.io   kube-system/metrics-server   False (MissingEndpoints)   4d20h

It seems to have been a problem with resources, I only had one node in the cluster with only 4GB memory and 2 vCPU, after scale problem solved. Thanks.

https://github.com/Azure/secrets-store-csi-driver-provider-azure/issues/167

There is indeed a metric-server running in this cluster.

ktl get -n kube-system deployment metrics-server
NAME             READY   UP-TO-DATE   AVAILABLE   AGE
metrics-server   0/1     1            0           4d21h
ktl describe -n kube-system deployment metrics-server|sed -n '/Condition/,/^[^ ]/p'
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Available      False   MinimumReplicasUnavailable
  Progressing    False   ProgressDeadlineExceeded
OldReplicaSets:  <none>

The associated pod failed to be ready.

ktl get pod -n kube-system metrics-server-ff9dbcb6c-qgr4c
NAME                             READY   STATUS    RESTARTS   AGE
metrics-server-ff9dbcb6c-qgr4c   0/1     Running   0          4d21h
ktl -n kube-system describe pod metrics-server-ff9dbcb6c-qgr4c|sed -n '/Events/,$p'
Events:
  Type     Reason     Age                         From     Message
  ----     ------     ----                        ----     -------
  Warning  Unhealthy  3m14s (x210750 over 4d21h)  kubelet  Readiness probe failed: HTTP probe failed with statuscode: 500
ktl -n kube-system logs metrics-server-ff9dbcb6c-qgr4c|head -n 1
E1114 10:11:35.797461       1 scraper.go:139] "Failed to scrape node" err="Get \"https://XXXXXXX/stats/summary?only_cpu_and_memory=true\": x509: certificate is valid for 127.0.0.1, YYYYYYYYYYY, not XXXXXXXXX" node="ZZZZZZZZZZZ"

The node has a certificate that works for localhost and its public address, but not its private address.

It looks like this advice might work.

Simply run:

kubectl patch deployment metrics-server -n kube-system –type ‘json’ -p ‘[{“op”: “add”, “path”: “spec/template/spec/containers/0/args-”, “value”: “–kubelet-insecure-tls”}]’ Make sure to adjust the namespace accordingly, e.g. default or kube-system

https://github.com/kubernetes-sigs/metrics-server/issues/525

It would be good to check whether we could simply use the http connection rather than the https one, instead of artificially ask to be insecure.

ktl exec -t -i -n kube-system metrics-server-ff9dbcb6c-qgr4c -- sh
error: Internal error occurred: error executing command in container: failed to exec in container: failed to start exec "a3422eeac47b032cfcd6631f6b27cd95831278c4839ceea57156f81d382e84be": OCI runtime exec failed: exec failed: container_linux.go:380: starting container process caused: exec: "ls": executable file not found in $PATH: unknown

Tried with bash, sh and even ls.

Chances are this is a distroless image.

ktl get -n kube-system pod metrics-server-ff9dbcb6c-qgr4c --output json|jq '.spec.containers[0].image'
"rancher/mirrored-metrics-server:v0.5.2"
clk docker image show-rootfs rancher/mirrored-metrics-server:v0.5.2|gi bin/
bin/
sbin/
usr/bin/
usr/sbin/
usr/sbin/tzconfig

Indeed, it contains no binary, but metric-server.

Its a good opportunity to try debug containers

ktl -n kube-system  debug metrics-server-ff9dbcb6c-qgr4c -ti --image=busybox
error: ephemeral containers are disabled for this cluster (error from server: "the server could not find the requested resource").

Well, it was worth trying.

Let’s create another pod then.

ktl -n kube-system create deployment --image=alpine debug -- sleep 3600
deployment.apps/debug created
$ ktl exec -t -i -n kube-system debug-789bb8d684-c5kwc -- sh

# apk add curl
# curl -k http://XXXXXXXX:10250/stats/summary?only_cpu_and_memory=true
Unauthorized
# curl -k http://XXXXXXXX:10250/stats/summary?only_cpu_and_memory=true
Client sent an HTTP request to an HTTPS server.

Well, now, that sounds silly. I should try to find the equivalent http port. Actually I find out that port 10250 in kubernetes == self and there is no http equivalent.

I can either accept the « insecure » solution or try to find out how to have the cluster run with the certificate of the nodes handling the private ip address.

It is a hetzner cloud, setup with the hcloud terraform provider. I should try to look at this to see whether there is an option.

I guess this is dealt with by https://github.com/hetznercloud/hcloud-cloud-controller-manager, but I could not find any option about the private ip of the nodes in the private network.

Somehow, the installation of k3s only creates a certificates for the node external ip but not the node internal ip. But somehow the metrics it creates uses the internalip and not the externalip.

Another lead is to make the metric server use the public ip, the one for which the certificate is actually released.

Let’s take a break and summarize what I know so far.

  • our hcloud setup installs k3s on the nodes, without disabling the metric server
  • we use a private network, set as InternalIP by the hcloud cloud manager
  • but the hcloud cloud manager generates a certificates that is only valid for the ExternalIP

It seems that the metric server deployed by k3s is the official ones1. At least the command line seems the same.

Trying with kubelet-preferred-address-types, I changed the deployment to use only the external ip address (using –kubelet-preferred-address-types=ExternalIP).

ktl -n kube-system logs metrics-server-7cfcd7bb46-fhjfn | gi scrape
E1114 16:18:17.167019       1 scraper.go:139] "Failed to scrape node" err="Get \"https://YYYYYYY:10250/stats/summary?only_cpu_and_memory=true\": context deadline exceeded" node="ZZZZZZZZZZZZZ"

The IP is indeed the external one, but the node does not actually seem to listen to those ip addresses, leading to a timeout.

Trying to put back but adding kubelet-insecure-tls, there is no error and the pod finally gets ready.

ktl -n kube-system get pod metrics-server-55cb6f95b5-8n2fn
NAME                              READY   STATUS    RESTARTS   AGE
metrics-server-55cb6f95b5-8n2fn   1/1     Running   0          83s

The API finally gets ready

ktl get apiservices.apiregistration.k8s.io v1beta1.metrics.k8s.io
NAME                     SERVICE                      AVAILABLE   AGE
v1beta1.metrics.k8s.io   kube-system/metrics-server   True        4d23h

And my namespace can be deleted once and for all.

ktl get namespace testns
Error from server (NotFound): namespaces "testns" not found

That is not ideal, as this should be for testing purpose only.

Let’s see anyway how to persist this configuration.

The manifest installing the metric server provides a way to change the kubelet-preferred-address-types, but not to add the –kubelet-insecure-tls flag. Actually, I cannot find kubelet-insecure-tls anywhere in the source code of k3s.

Therefore I see no choice but to patch the deployment afterwards, using:

Simply run:

kubectl patch deployment metrics-server -n kube-system –type ‘json’ -p ‘[{“op”: “add”, “path”: “spec/template/spec/containers/0/args-”, “value”: “–kubelet-insecure-tls”}]’ Make sure to adjust the namespace accordingly, e.g. default or kube-system

https://github.com/kubernetes-sigs/metrics-server/issues/525

Now, let’s try to have a more elegant solution, that would not require to patch a deployment created by k3s.

I need to find a way to inform the CCM that the kubelet running on the nodes should be signed using its InternalIP, at least using a SANs.

Connecting to the node I can run of

openssl x509  -noout -text -in /var/lib/rancher/k3s/agent/serving-kubelet.crt |grep Alternative -A1
X509v3 Subject Alternative Name:
   DNS:NAME, DNS:localhost, IP Address:127.0.0.1, IP Address:<ExternalIP>

I can try using the –tls-san option of k3s

But I still need to have a way to know the internal ip to set it.

Knowing that I configured the internal ip to use the ip range 10.0.0.0/24 I can craft something like

ip a|grep 10.0.0|sed -r 's/^.+(10.0.0.[0-9]+).+$/\1/'
10.0.0.4

That could be run prior to run k3s in the nodes.


  1. looking at the layers we can see it basically gets the binary built from the metrics-server↩︎