What Is V1beta1.metrics.k8s.io and What Does False (MissingEndpoints) Means
Fleetingwhat is v1beta1.metrics.k8s.io and what does False (MissingEndpoints) means kubernetes
ktl create namespace testns
namespace/testns created
ktl delete namespaces testns
namespace "testns" deleted
but hanging…
ktl describe namespace testns
Name: testns
Labels: kubernetes.io/metadata.name=testns
Annotations: <none>
Status: Terminating
Conditions:
Type Status LastTransitionTime Reason Message
---- ------ ------------------ ------ -------
NamespaceDeletionDiscoveryFailure True Mon, 14 Nov 2022 14:39:09 +0100 DiscoveryFailed Discovery failed for some groups, 1 failing: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
NamespaceDeletionGroupVersionParsingFailure False Mon, 14 Nov 2022 14:39:09 +0100 ParsedGroupVersions All legacy kube types successfully parsed
NamespaceDeletionContentFailure False Mon, 14 Nov 2022 14:39:09 +0100 ContentDeleted All content successfully deleted, may be waiting on finalization
NamespaceContentRemaining False Mon, 14 Nov 2022 14:39:09 +0100 ContentRemoved All content successfully removed
NamespaceFinalizersRemaining False Mon, 14 Nov 2022 14:39:09 +0100 ContentHasNoFinalizers All content-preserving finalizers finished
No resource quota.
No LimitRange resource.
ktl get apiservices.apiregistration.k8s.io v1beta1.metrics.k8s.io
NAME SERVICE AVAILABLE AGE
v1beta1.metrics.k8s.io kube-system/metrics-server False (MissingEndpoints) 4d20h
It seems to have been a problem with resources, I only had one node in the cluster with only 4GB memory and 2 vCPU, after scale problem solved. Thanks.
— https://github.com/Azure/secrets-store-csi-driver-provider-azure/issues/167
There is indeed a metric-server running in this cluster.
ktl get -n kube-system deployment metrics-server
NAME READY UP-TO-DATE AVAILABLE AGE
metrics-server 0/1 1 0 4d21h
ktl describe -n kube-system deployment metrics-server|sed -n '/Condition/,/^[^ ]/p'
Conditions:
Type Status Reason
---- ------ ------
Available False MinimumReplicasUnavailable
Progressing False ProgressDeadlineExceeded
OldReplicaSets: <none>
The associated pod failed to be ready.
ktl get pod -n kube-system metrics-server-ff9dbcb6c-qgr4c
NAME READY STATUS RESTARTS AGE
metrics-server-ff9dbcb6c-qgr4c 0/1 Running 0 4d21h
ktl -n kube-system describe pod metrics-server-ff9dbcb6c-qgr4c|sed -n '/Events/,$p'
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 3m14s (x210750 over 4d21h) kubelet Readiness probe failed: HTTP probe failed with statuscode: 500
ktl -n kube-system logs metrics-server-ff9dbcb6c-qgr4c|head -n 1
E1114 10:11:35.797461 1 scraper.go:139] "Failed to scrape node" err="Get \"https://XXXXXXX/stats/summary?only_cpu_and_memory=true\": x509: certificate is valid for 127.0.0.1, YYYYYYYYYYY, not XXXXXXXXX" node="ZZZZZZZZZZZ"
The node has a certificate that works for localhost and its public address, but not its private address.
It looks like this advice might work.
Simply run:
kubectl patch deployment metrics-server -n kube-system –type ‘json’ -p ‘[{“op”: “add”, “path”: “spec/template/spec/containers/0/args-”, “value”: “–kubelet-insecure-tls”}]’ Make sure to adjust the namespace accordingly, e.g. default or kube-system
— https://github.com/kubernetes-sigs/metrics-server/issues/525
It would be good to check whether we could simply use the http connection rather than the https one, instead of artificially ask to be insecure.
ktl exec -t -i -n kube-system metrics-server-ff9dbcb6c-qgr4c -- sh
error: Internal error occurred: error executing command in container: failed to exec in container: failed to start exec "a3422eeac47b032cfcd6631f6b27cd95831278c4839ceea57156f81d382e84be": OCI runtime exec failed: exec failed: container_linux.go:380: starting container process caused: exec: "ls": executable file not found in $PATH: unknown
Tried with bash, sh and even ls.
Chances are this is a distroless image.
ktl get -n kube-system pod metrics-server-ff9dbcb6c-qgr4c --output json|jq '.spec.containers[0].image'
"rancher/mirrored-metrics-server:v0.5.2"
clk docker image show-rootfs rancher/mirrored-metrics-server:v0.5.2|gi bin/
bin/
sbin/
usr/bin/
usr/sbin/
usr/sbin/tzconfig
Indeed, it contains no binary, but metric-server.
Its a good opportunity to try debug containers
ktl -n kube-system debug metrics-server-ff9dbcb6c-qgr4c -ti --image=busybox
error: ephemeral containers are disabled for this cluster (error from server: "the server could not find the requested resource").
Well, it was worth trying.
Let’s create another pod then.
ktl -n kube-system create deployment --image=alpine debug -- sleep 3600
deployment.apps/debug created
$ ktl exec -t -i -n kube-system debug-789bb8d684-c5kwc -- sh
# apk add curl
# curl -k http://XXXXXXXX:10250/stats/summary?only_cpu_and_memory=true
Unauthorized
# curl -k http://XXXXXXXX:10250/stats/summary?only_cpu_and_memory=true
Client sent an HTTP request to an HTTPS server.
Well, now, that sounds silly. I should try to find the equivalent http port. Actually I find out that port 10250 in kubernetes == self and there is no http equivalent.
I can either accept the « insecure » solution or try to find out how to have the cluster run with the certificate of the nodes handling the private ip address.
It is a hetzner cloud, setup with the hcloud terraform provider. I should try to look at this to see whether there is an option.
I guess this is dealt with by https://github.com/hetznercloud/hcloud-cloud-controller-manager, but I could not find any option about the private ip of the nodes in the private network.
Somehow, the installation of k3s only creates a certificates for the node external ip but not the node internal ip. But somehow the metrics it creates uses the internalip and not the externalip.
Another lead is to make the metric server use the public ip, the one for which the certificate is actually released.
Let’s take a break and summarize what I know so far.
- our hcloud setup installs k3s on the nodes, without disabling the metric server
- we use a private network, set as InternalIP by the hcloud cloud manager
- but the hcloud cloud manager generates a certificates that is only valid for the ExternalIP
It seems that the metric server deployed by k3s is the official ones1. At least the command line seems the same.
Trying with kubelet-preferred-address-types, I changed the deployment to use only the external ip address (using –kubelet-preferred-address-types=ExternalIP).
ktl -n kube-system logs metrics-server-7cfcd7bb46-fhjfn | gi scrape
E1114 16:18:17.167019 1 scraper.go:139] "Failed to scrape node" err="Get \"https://YYYYYYY:10250/stats/summary?only_cpu_and_memory=true\": context deadline exceeded" node="ZZZZZZZZZZZZZ"
The IP is indeed the external one, but the node does not actually seem to listen to those ip addresses, leading to a timeout.
Trying to put back but adding kubelet-insecure-tls, there is no error and the pod finally gets ready.
ktl -n kube-system get pod metrics-server-55cb6f95b5-8n2fn
NAME READY STATUS RESTARTS AGE
metrics-server-55cb6f95b5-8n2fn 1/1 Running 0 83s
The API finally gets ready
ktl get apiservices.apiregistration.k8s.io v1beta1.metrics.k8s.io
NAME SERVICE AVAILABLE AGE
v1beta1.metrics.k8s.io kube-system/metrics-server True 4d23h
And my namespace can be deleted once and for all.
ktl get namespace testns
Error from server (NotFound): namespaces "testns" not found
That is not ideal, as this should be for testing purpose only.
Let’s see anyway how to persist this configuration.
The manifest installing the metric server provides a way to change the kubelet-preferred-address-types, but not to add the –kubelet-insecure-tls flag. Actually, I cannot find kubelet-insecure-tls anywhere in the source code of k3s.
Therefore I see no choice but to patch the deployment afterwards, using:
Simply run:
kubectl patch deployment metrics-server -n kube-system –type ‘json’ -p ‘[{“op”: “add”, “path”: “spec/template/spec/containers/0/args-”, “value”: “–kubelet-insecure-tls”}]’ Make sure to adjust the namespace accordingly, e.g. default or kube-system
— https://github.com/kubernetes-sigs/metrics-server/issues/525
Now, let’s try to have a more elegant solution, that would not require to patch a deployment created by k3s.
I need to find a way to inform the CCM that the kubelet running on the nodes should be signed using its InternalIP, at least using a SANs.
Connecting to the node I can run of
openssl x509 -noout -text -in /var/lib/rancher/k3s/agent/serving-kubelet.crt |grep Alternative -A1
X509v3 Subject Alternative Name:
DNS:NAME, DNS:localhost, IP Address:127.0.0.1, IP Address:<ExternalIP>
I can try using the –tls-san option of k3s
But I still need to have a way to know the internal ip to set it.
Knowing that I configured the internal ip to use the ip range 10.0.0.0/24 I can craft something like
ip a|grep 10.0.0|sed -r 's/^.+(10.0.0.[0-9]+).+$/\1/'
10.0.0.4
That could be run prior to run k3s in the nodes.
Permalink
-
looking at the layers we can see it basically gets the binary built from the metrics-server. ↩︎