Konubinix' site

How I Debug MTU Issues in K3d in Docker in Earthly in Docker in K8s

Fleeting

earthly is awesome because it unlocks a bunch of use cases that were impossible otherwise. But it comes with several layers of abstraction that may be worth understanding.

In the past, I have already been confronted to issues with running docker in some containerized environment. The last one was about the fact that docker does not check the mtu at startup.

I fell into a similar issue today. I needed to dig deeper into all the layer of abstraction I set up in the CI. I take advantage of this to write down what I learned.

When I need to tests software in the context of k8s, my CI looks like this:

  1. a virtual machine hosted on hetzner,
  2. running k3s, hence containerd,
  3. running an actions-runner-controller,
  4. running earthly, hence earthly-buildkitd in docker,
  5. running buildkit-runc,
  6. running docker (because of WITH DOCKER) to run k3d,
  7. running k3s, hence containerd,
  8. running the pods in am interested in.

All those layers make sense and provide a comfortable development environment.

1-3 make the handling of the CI infrastructure easy and allows fine tuning horizontal scaling.

4-5 comes with earthly, bring high reproducibility power, ease organising ones code and artifacts and free the developers from having to adapt to CI provider subtleties1.

6-8 brings the ease of testing stuff in a reproducible way. Since there are a lot of code in the helm charts and values, it is very useful to run tests in that way2.

But when something goes wrong and you need to get into it, that may become a bit tricky.

This setup did not work out of the box. The cluster never started. Therefore I needed to get into the running github action.

Because I already have some issues with the MTU, I started looking at it.

There are two issues here:

First, the docker spawned by earthly (using WITH DOCKER) did not have an appropriate mtu (1500, which is greater than 1400 in the outside network).

Unfortunately, the current code does not provide a mean to change it. But it does not overwrite whatever config already exists.

Hence we added just before WITH DOCKER the following line.

RUN mkdir -p /etc/docker && echo '{"mtu": 1400}' > /etc/docker/daemon.json

That way, the spawned docker has the appropriate mtu.

The second problem is that k3d (using k3d registry create and --registry-use) spawns a container in a network without guessing the mtu. Docker falls back on 1500 (even though docker itself is configured to use mtu 1400).

I could extend clk k8s to run create a network with the correct mtu using docker network create somename --opt com.docker.network.driver.mtu=1400 and then use k3d cluster create --network somename and k3d registry create --default-network somename.

But I got pragmatic and used kind instead, than actually infer the appropriate value of mtu: 1400.

Notes linking here


  1. The CI job becomes as simple as:

    ci:
      runs-on: [self-hosted]
      steps:
      - uses: earthly/actions-setup@v1
        id: setup
        with:
          version: v0.6.29
      - run: earthly -P --push +ci
    
     ↩︎
  2. In the Earthfile I can test my code in a cluster with something like

    myimage:
       FROM alpine
       ...
       SAVE IMAGE myremotename
    
    ci:
        FROM earthly/dind:ubuntu
        RUN apt-get update && apt-get install --yes git wget python3-distutils python3-pytest
        RUN wget -O - https://clk-project.org/install.sh | env CLK_EXTENSIONS=k8s bash
        WORKDIR /app
        COPY --dir myfiles /app
        WITH DOCKER --load 127.0.0.1:5000/myimage:dev=+myimage
            RUN /app/test.sh
        END
    

    And then in the test (I use clk k8s).

    #!/bin/bash
    
    clk k8s flow
    docker push --quiet 127.0.0.1:5000/myimage:dev
    helm dependency-update /app/helm
    helm install myapp /app/helm \
         --values /app/helm/values-dev.yaml \
         --set myapp.image.repository=localhost:5000/myimage \
         --set myapp.image.tag=dev
    
    kubectl wait pods -l app.kubernetes.io/name=myimage --for condition=Ready --timeout=2m
    pytest /app/tests
    
     ↩︎