How I Debug MTU Issues in K3d in Docker in Earthly in Docker in K8s

earthly is awesome because it unlocks a bunch of use cases that were impossible otherwise. But it comes with several layers of abstraction that may be worth understanding.

In the past, I have already been confronted to issues with running docker in some containerized environment. The last one was about the fact that docker does not check the mtu at startup.

I fell into a similar issue today. I needed to dig deeper into all the layer of abstraction I set up in the CI. I take advantage of this to write down what I learned.

When I need to tests software in the context of k8s, my CI looks like this:

a virtual machine hosted on hetzner,
running k3s, hence containerd,
running an actions-runner-controller,
running earthly, hence earthly-buildkitd in docker,
running buildkit-runc,
running docker (because of WITH DOCKER) to run k3d,
running k3s, hence containerd,
running the pods in am interested in.

All those layers make sense and provide a comfortable development environment.

1-3 make the handling of the CI infrastructure easy and allows fine tuning horizontal scaling.

4-5 comes with earthly, bring high reproducibility power, ease organising ones code and artifacts and free the developers from having to adapt to CI provider subtleties¹.

6-8 brings the ease of testing stuff in a reproducible way. Since there are a lot of code in the helm charts and values, it is very useful to run tests in that way².

But when something goes wrong and you need to get into it, that may become a bit tricky.

This setup did not work out of the box. The cluster never started. Therefore I needed to get into the running github action.

Because I already have some issues with the MTU, I started looking at it.

ssh in the hetzner machine -> ip a | grep mtu -> 1430
kubectl exec -t -i myrunningrunner sh
- ip a | grep mtu -> 1430
- dk exec -ti earthly-buildkitd sh
  - ip a|grep mtu -> 1400 (because of fix Docker MTU issues and solutions in actions-runner-controller)
  - buildkit-runc exec -t sw7b9iz0d6ulw7l2rlriozqaz bash (use buildkit-runc list to find the container)
    - apt install –yes iproute2
    - ip a|grep mtu -> 1400 for eth0, but 1500 for docker -> that’s a bad sign
    - docker exec -ti clk-k8s-control-plane bash
      - ip a|grep mtu -> 1500, that sucks
      - ctr image pull -> error

There are two issues here:

First, the docker spawned by earthly (using WITH DOCKER) did not have an appropriate mtu (1500, which is greater than 1400 in the outside network).

Unfortunately, the current code does not provide a mean to change it. But it does not overwrite whatever config already exists.

Hence we added just before WITH DOCKER the following line.

RUN mkdir -p /etc/docker && echo '{"mtu": 1400}' > /etc/docker/daemon.json

That way, the spawned docker has the appropriate mtu.

The second problem is that k3d (using k3d registry create and --registry-use) spawns a container in a network without guessing the mtu. Docker falls back on 1500 (even though docker itself is configured to use mtu 1400).

I could extend clk k8s to run create a network with the correct mtu using docker network create somename --opt com.docker.network.driver.mtu=1400 and then use k3d cluster create --network somename and k3d registry create --default-network somename.

But I got pragmatic and used kind instead, than actually infer the appropriate value of mtu: 1400.

Notes linking here

clk k8s and earthly in a local dev env
debug k8s in docker in earthly (braindump)

Permalink

The CI job becomes as simple as:

ci:
  runs-on: [self-hosted]
  steps:
  - uses: earthly/actions-setup@v1
    id: setup
    with:
      version: v0.6.29
  - run: earthly -P --push +ci

↩︎

In the Earthfile I can test my code in a cluster with something like

myimage:
   FROM alpine
   ...
   SAVE IMAGE myremotename

ci:
    FROM earthly/dind:ubuntu
    RUN apt-get update && apt-get install --yes git wget python3-distutils python3-pytest
    RUN wget -O - https://clk-project.org/install.sh | env CLK_EXTENSIONS=k8s bash
    WORKDIR /app
    COPY --dir myfiles /app
    WITH DOCKER --load 127.0.0.1:5000/myimage:dev=+myimage
        RUN /app/test.sh
    END

And then in the test (I use clk k8s).

#!/bin/bash

clk k8s flow
docker push --quiet 127.0.0.1:5000/myimage:dev
helm dependency-update /app/helm
helm install myapp /app/helm \
     --values /app/helm/values-dev.yaml \
     --set myapp.image.repository=localhost:5000/myimage \
     --set myapp.image.tag=dev

kubectl wait pods -l app.kubernetes.io/name=myimage --for condition=Ready --timeout=2m
pytest /app/tests

↩︎