Act Runner in k8s fail to connect to docker daemon

I have successfully provisioned an act runner using docker compose on my host using documentation from gitea.com.

I have also found examples of people running custom docker dind containers that run act runner along side dind. I have this type of act runner running in a k8s cluster.

I install docker and buildx in my action workflow before trying to build a dockerfile.

What I cannot get to work is if the workflow includes buildx it fails to run in the k8s cluster.

In k8s act runner with dind when the workflow with buildx runs it cannot connect to docker daemon either using socket file or localhost:2375.

I am trying to understand what I am overlooking where the action workflow is running as a container ā€œon the hostā€ using the docker:// syntax for the label and buildx cannot reach the docker daemon.

I found this helpful for k8s/k3s setup:
https://namesny.com/blog/gitea_actions_k3s_docker/

Thanks, I had come across that guide which eventually seems to just abandon buildx for docker. I am running full k8s, not k3s, so Iā€™ve not seen any filesystem errors.

The link to the k8s act runner is very similar to what I have running. The docker socket is reachable by the act-runner at tcp://localhost:2375.

What Iā€™m stuck on is when I launch buildx it complains it cannot reach the docker socket even if I try different values and I think the localhost:2375 makes sense in a DinD pod.

docker buildx create --config ./buildkitd.toml tcp://localhost:2375

So I think Iā€™m stuck trying to understand how to present the dind localhost:2375 within an act-runner workflow container which I assume is running in the DinD container?

I wish to use buildx as I want to use caching and cross-platform (QEMU) builds for amd64 and arm64 on amd64 nodes.

Still a problem @MichaelC ?
i can elaborate on some points, if needed.

@mariusrugan

I am currently trying to get this to work. I have a talos cluster.
But canā€™t get qemu working.

If you have some pointers that would be appriciated

@mariusrugan yea I still have the same problem. Any help/example manifest would be very helpful.

iā€™ve added in a gist,

my current working config.
Iā€™m running this as a StatefulSet with 2 containers, so docker container is external and not privileged.
Also the Configmap is very important for all the setup, itā€™s also generated from gitea act_runner and adjusted for docker connectivity ouside the act container.

Be aware of the k8s secret that contains the token.
EDIT: also be aware of the PVC needed for the act_runner to store config.

iā€™ve chosen this setup also to be able to buildx with qemu, which youā€™ll find in the gist in the action yaml.

I think itā€™s straight forward, let me know if you have any issues, iā€™ll try to answer.

1 Like

Action with this setup, proof of correct setup.

Ofc it can be taken forward, personally iā€™d add everything that is done in the action (installing qemu) in one image and use that as a build harness.
like as

EDIT:
@aladante iā€™m curious about the Talos part, let me know if the contraption with 2 containers works, my runner is running in a Debian, x86_64, running in a k3s cluster.

1 Like

Thanks @mariusrugan - I will go through your post!

Thank you the explanation has been very informative. I am already really glad with the config. Didnā€™t knew that that was an option.
I have the same setup as you with a docker daemon running next to the act-runner. Going to try with config.

And also trying to adjust the docker_hosts in the pipeline.
I will keep you update on succes or failure.
Appricitate the pointers!

IT WORKKSS.

Massive thanks!
Everything is building and it works.

Much appriacted

1 Like

Thanks a lot for sharing @mariusrugan.
The sample is not complete since you need to bring your own secret and volume but it is fair to assume that people who already tried the setup already have those. Your example is very helpful, I wish I would have found that in the doc instead of the example that does not workā€¦

I think you should add the namespace to the stateful set template.

...
    reloader.stakater.com/auto: "true"
spec:
  replicas: 1
  selector:
    matchLabels:
      app: gitea-act-runner-dind
  serviceName: gitea-act-runner-dind
  template:
    metadata:
      namespace: gitea-runners                  # <<<<< HERE (my namespace differs)
      labels:
        app: gitea-act-runner-dind
    spec:
...

Another suggestion I would make it to fix your data volumes so that running more than one replica also works.
For that, remove:

# - name: runner-data
#   persistentVolumeClaim:
#     claimName: act-runner-vol

Add something like:

  volumeClaimTemplates:
    - apiVersion: v1
      kind: PersistentVolumeClaim
      metadata:
        name: gitea-runner-storage
        namespace: gitea-runners
      spec:
        storageClassName: nfs-provisioner-ssd
        accessModes: [ "ReadWriteOnce" ]
        resources:
          requests:
            storage: "1Gi"

and ensure the volumeMounts include:

          volumeMounts:
            - name: docker-certs
              mountPath: /certs
            - name: gitea-runner-storage             # <<< THAT
              mountPath: /data                       # <<< THAT
          ...

It will create a Persistence Volume Claim per replica and bind it to the pod started by the replicaset.

With that, setting replica: 2 for instance, will give you 2 runners:

Finally, I would suggest a better Name for the runners.
Using the node name is not great: you may have several runners running on a node, and they all will show up with the same name.

Instead, you can use:

- name: GITEA_RUNNER_NAME
              valueFrom:
                fieldRef:
                  fieldPath: status.podIP

which gives a better way to uniquely id those runners (I canā€™t add another image so here is the text versionā€¦):

Status	ID	Name	Version	Type	Labels	Last Online Time	Edit
Idle	13	10.244.2.250 ...
Idle	14	10.244.1.243 ...

@mariusrugan - My original approach to create an image that included a lot of the setup. Thanks so much for a complete and well written example!

One question off the top is how did I end up with port 2375 and you use port 2376?

Iā€™m also curious about whether its tcp://docker or tcp://localhost as I see this gist example use ā€˜dockerā€™ while act_runner/dind-docker.yaml at main - act_runner - Gitea: Git with a cup of tea uses ā€˜localhostā€™. The ā€˜dockerā€™ hostname doesnā€™t work for me.

The second question, is I have setup this configuration inside k8sā€¦ and I have a script that calls the followingā€¦

docker buildx create 
Creating buildx using 
ERROR: failed to initialize builder vigilant_snyder (vigilant_snyder0): Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
docker run --privileged --rm tonistiigi/binfmt --install all

and results in this generic socket errorā€¦

Login Succeeded

docker: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?.
See 'docker run --help'.

What I dont understand is if this is running on the DinD sidecar there is a socket present and not sure why this script (running in the action) canā€™t see it.

Subsequent calls will be using buildx to build containers and push them to a registry.

The entire workflow works fine where the act-runner runs on a non-k8s machine. So Iā€™m wondering why this workflow doesnā€™t find a docker environment it needs.

hi,

#1. 2375 vs. 2376

It is conventional to use port 2375 for un-encrypted, and port 2376 for encrypted communication with the daemon.

from

#2. I just kicked the action and runs just fine on my k3s environment, am checking diffs nothing in particular outstanding, just
image: docker:26.1.4-dind for DInD.

I have an issue with pushing the image due to the token, in the same Gitea instance - but all the rest works fine, proof in screenshot.

Is the daemon - running in your sts ?
Screenshot 2024-11-19 at 21.37.12

EDIT:

I restarted the container and it pulled the latest ā€œgitea/act_runner:nightlyā€
and it seems the DinD container is getting into a CrashLoopBackOff

Screenshot 2024-11-19 at 21.47.17

This could be your case too ?

I understand the confusion related to, the naming, is well founded, e.g. in the action why we refer to ā€œdockerā€

and i think it has to do with Gitea self-hosted runner Ā· GitHub

which has the following explanations:

tldr;
we set that option to for the act_runner process to communicate with localhost in the end via ENV vars.

However,

what throws me off in your case is daemon at unix:///var/run/docker.sock which is not tcp:// so something is off with your configs.

I also see that gitea/act_runner:nightly is not starting,
due to Gitea self-hosted runner Ā· GitHub
which is at fault due to tini:

Here is the (ansible) code that provisions the runnerā€¦

---
- name: "Create act-runner k8s namespace"
  kubernetes.core.k8s:
    kubeconfig: /home/{{ansible_user}}/.kube/config
    name: act-runner
    api_version: v1
    kind: Namespace
    state: present

- name: "Create act-runner storage claim"
  kubernetes.core.k8s:
    kubeconfig: /home/{{ansible_user}}/.kube/config
    state: present
    definition: 
      kind: PersistentVolumeClaim
      apiVersion: v1
      metadata:
        name: act-runner-vol
        namespace: act-runner
      spec:
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 200Mi
        storageClassName: longhorn

- name: "Create act-runner secret (registration token)"
  kubernetes.core.k8s:
    kubeconfig: /home/{{ansible_user}}/.kube/config
    state: present
    definition: 
      apiVersion: v1
      data:
        token: "{{ act_runner_token | b64encode }}"
      kind: Secret
      metadata:
        name: runner-secret
        namespace: act-runner
      type: Opaque

- name: "Create act runner configmap"
  kubernetes.core.k8s:
    kubeconfig: /home/{{ansible_user}}/.kube/config
    state: present
    definition: 
      apiVersion: v1
      kind: ConfigMap
      metadata:
        name: act-runner-config
        namespace: act-runner
        labels:
          app: act-runner
      data:
        config.yaml: |
          log:
            level: info

          runner:
            file: .runner
            capacity: 4
            env_file: .env
            timeout: 1h
            insecure: false
            fetch_timeout: 5s
            fetch_interval: 10s
            labels: [
              "bookworm:docker://node:21-bookworm",
              "bookworm-ham:docker://registry.dtmc.ca/gitea/bookworm-ham"
            ]

          cache:
            enabled: true
            dir: ""
            host: ""
            port: 0
            external_server: ""

          container:
            network: ""
            privileged: false
            options:
            workdir_parent:
            valid_volumes: []
            docker_host: ""
            force_pull: false

          host:
            workdir_parent:
          

- name: "Create act-runner deployment"
  kubernetes.core.k8s:
    kubeconfig: /home/{{ansible_user}}/.kube/config
    state: present
    definition: 
      apiVersion: apps/v1
      kind: Deployment
      metadata:
        labels:
          app: act-runner
        name: act-runner
        namespace: act-runner
      spec:
        replicas: 1
        selector:
          matchLabels:
            app: act-runner
        strategy: {}
        template:
          metadata:
            creationTimestamp: null
            labels:
              app: act-runner
            name: act-runner
            namespace: act-runner
          spec:
            restartPolicy: Always
            volumes:
            - name: docker-certs
              emptyDir: {}
            - name: runner-data
              persistentVolumeClaim:
                claimName: act-runner-vol
            - name: act-runner-config
              configMap:
                name: act-runner-config
            containers:
            - name: runner
              image: gitea/act_runner:latest
              command: ["sh", "-c", "while ! nc -z localhost 2376 </dev/null; do echo 'waiting for docker daemon...'; sleep 5; done; /sbin/tini -- /opt/act/run.sh"]
              env:
              - name: DOCKER_HOST
                value: tcp://localhost:2376
              - name: DOCKER_CERT_PATH
                value: /certs/client
              - name: DOCKER_TLS_VERIFY
                value: "1"
              - name: GITEA_INSTANCE_URL
                value: https://gitea.dtmc.ca
              - name: GITEA_RUNNER_REGISTRATION_TOKEN
                valueFrom:
                  secretKeyRef:
                    name: runner-secret
                    key: token
              - name: CONFIG_FILE
                value: /act-config/config.yaml
              volumeMounts:
              - name: docker-certs
                mountPath: /certs
              - name: runner-data
                mountPath: /data
              - name: act-runner-config
                mountPath: /act-config
            - name: daemon
              image: docker:23.0.6-dind
              env:
              - name: DOCKER_TLS_CERTDIR
                value: /certs
              securityContext:
                privileged: true
              volumeMounts:
              - name: docker-certs
                mountPath: /certs

It successfully provisions the act-runner and daemon pods in k3s cluster and uses a config that uses some custom ā€œbookworm-hamā€ container to run jobs in which basically has docker installed and a bunch of toolchain stuffā€¦

FROM node:22-bookworm AS base

# This is a single-layer addition to bookworm base image to improve CICD loadbuild times
# on act-runners
ENV UBUNTU_RELEASE=jammy
ENV DOCKER_COMPOSE_RELEASE=v2.29.1
ENV TEA_VERSION=v0.9.2

# Actions and scripts can use this ENV variable to skip loadbuid/toolchain installation
# steps if they know it is included in this base image.
ENV SKIP_TOOLCHAIN=1 

RUN apt-get update && apt-get dist-upgrade -y && \
    apt install -y --no-install-recommends ca-certificates software-properties-common openssh-client curl gnupg jq rsync && \
    mkdir -p /etc/apt/keyrings && \
    curl -fsSL https://download.docker.com/linux/ubuntu/gpg |  gpg --dearmor -o /etc/apt/keyrings/docker.gpg && \
    echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu ${UBUNTU_RELEASE} stable" |  tee /etc/apt/sources.list.d/docker.list > /dev/null && \
    apt update -y && \
    apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin && \
    apt-get install -y python3-launchpadlib && \
    #add-apt-repository ppa:longsleep/golang-backports -y && \
    add-apt-repository "deb http://ppa.launchpad.net/longsleep/golang-backports/ubuntu ${UBUNTU_RELEASE} main" && \
    apt-get update && \
    apt-get install -y golang && \
    wget -O- "https://keyserver.ubuntu.com/pks/lookup?fingerprint=on&op=get&search=0x6125E2A8C77F2818FB7BD15B93C4A3FD7BB9C367" | gpg --no-tty --dearmour -o $PWDDIR/ansible-archive-keyring.gpg && \
    echo "deb [trusted=yes] http://ppa.launchpad.net/ansible/ansible/ubuntu ${UBUNTU_RELEASE} main" | tee /etc/apt/sources.list.d/ansible.list && \
    apt update && \
    apt install -y ansible && \
    curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 && \
    chmod 700 get_helm.sh && \
    ./get_helm.sh && \
    git clone https://gitea.com/gitea/tea.git && \
    cd tea  && \
    git checkout ${TEA_VERSION}  && \
    go mod vendor && \
    make  && \
    make install

This toolchain image is built because on non-k8s environment, I have installed docker into the ā€œbookworm-hamā€ container.

I did this because this fails to build successfully within a gitea act-runner.

I need this ā€œdocker setupā€ on a non-k8s that runs the act-runner to be able to run docker buildx scripts.

This may be where Iā€™m getting my wires crossedā€¦ and somehow messing up the overall k8s DinD configuration?

this the options iā€™m talking about:

options: "--add-host=docker:host-gateway -v /certs:/certs"

Iā€™d go back one step and apply by hand the configmap iā€™m using

and then Ansible it