Konubinix' opinionated web of thoughts

How I Debug My K8s Tests Not Running

Fleeting

Some times, the tests of clk k8s run forever, with a log like

+test | action: Getting node:None
+test | debug: Given the distribution kind, I inferred the context kind-clk-k8s
+test | action: run: kubectl --context kind-clk-k8s get node --namespace default --output json
+test | warning: Waited 780s for the node to be ready. It's been a long time now, something may be wrong. I'm still waiting for eternity
+test | warning: clk-k8s-control-plane: KubeletHasSufficientMemory, KubeletHasNoDiskPressure, KubeletHasSufficientPID, KubeletNotReady

It wait forever for the cluster to start.

In that case, I want to understand what makes the cluster not been ready.

The layers I have to deal with are

  1. earthly runs the tests using WITH DOCKER. I can enter it using
    1. docker exec -ti earthly-buildkitd sh to enter the earthly builder docker container
    2. buildkit-runc list to find the running earthly job
    3. buildkit-runc exec -t o7mtrt511xmf9okk0pcgjbtqv bash to enter it
  2. in the WITH DOCKER layer, hence I can
    1. docker ps to find the running containers
    2. docker exec -ti clk-k8s-control-plane bash to enter the running instance of kind if need be

In the earthly job, I can use kubectl to request the cluster.

This can be made a onliner to ease debugging. Using a temporary clk alias

clk alias set run exec -- docker exec earthly-buildkitd buildkit-runc exec o7mtrt511xmf9okk0pcgjbtqv
New global alias for run: exec docker exec earthly-buildkitd buildkit-runc exec o7mtrt511xmf9okk0pcgjbtqv

Then, I can investigate using kubectl.

clk run kubectl get node
NAME                    STATUS     ROLES                  AGE   VERSION
clk-k8s-control-plane   NotReady   control-plane,master   26m   v1.21.1

This confirms the fact that the cluster is not ready.

Now, let’s dig deeper into why it is not ready

clk run kubectl get node --output json | jq | jq -r '.items[0].status.conditions[]|select(.reason == "KubeletNotReady").message'
container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized

This tells me that I should look into the calico installation.