Act Runner in k8s fail to connect to docker daemon

MichaelC · March 8, 2024, 11:23pm

I have successfully provisioned an act runner using docker compose on my host using documentation from gitea.com.

I have also found examples of people running custom docker dind containers that run act runner along side dind. I have this type of act runner running in a k8s cluster.

I install docker and buildx in my action workflow before trying to build a dockerfile.

What I cannot get to work is if the workflow includes buildx it fails to run in the k8s cluster.

In k8s act runner with dind when the workflow with buildx runs it cannot connect to docker daemon either using socket file or localhost:2375.

I am trying to understand what I am overlooking where the action workflow is running as a container “on the host” using the docker:// syntax for the label and buildx cannot reach the docker daemon.

neodino · March 9, 2024, 12:23am

I found this helpful for k8s/k3s setup:
https://namesny.com/blog/gitea_actions_k3s_docker/

MichaelC · March 9, 2024, 4:32pm

Thanks, I had come across that guide which eventually seems to just abandon buildx for docker. I am running full k8s, not k3s, so I’ve not seen any filesystem errors.

The link to the k8s act runner is very similar to what I have running. The docker socket is reachable by the act-runner at tcp://localhost:2375.

What I’m stuck on is when I launch buildx it complains it cannot reach the docker socket even if I try different values and I think the localhost:2375 makes sense in a DinD pod.

docker buildx create --config ./buildkitd.toml tcp://localhost:2375

So I think I’m stuck trying to understand how to present the dind localhost:2375 within an act-runner workflow container which I assume is running in the DinD container?

I wish to use buildx as I want to use caching and cross-platform (QEMU) builds for amd64 and arm64 on amd64 nodes.

mariusrugan · May 11, 2024, 2:25pm

Still a problem @MichaelC ?
i can elaborate on some points, if needed.

aladante · May 30, 2024, 11:38am

@mariusrugan

I am currently trying to get this to work. I have a talos cluster.
But can’t get qemu working.

If you have some pointers that would be appriciated

MichaelC · May 30, 2024, 12:22pm

@mariusrugan yea I still have the same problem. Any help/example manifest would be very helpful.

mariusrugan · May 30, 2024, 2:43pm

i’ve added in a gist,

gist.github.com

https://gist.github.com/mariusrugan/911f5da923c93f3c795d3e84bed9e256

build-image.yaml

# Inspired by:
# https://blog.gitea.com/creating-go-actions/
# https://gitea.com/Zettat123/test-simple-go-action/src/branch/main/.gitea/workflows/call-username.yml
# https://gitea.com/actions/release-action
# https://gitea.com/gitea/runner-images
# https://github.com/vegardit/docker-gitea-act-runner
# https://github.com/catthehacker/docker_images
#
---
# yaml-language-server: $schema=https://json.schemastore.org/github-workflow.json

This file has been truncated. show original

gitea-config-cm.yaml

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: config
  namespace: actions-runner-system
  annotations:
    reloader.stakater.com/auto: "true"
data:
  config.yaml: |-

This file has been truncated. show original

gitea-sts.yaml


# Inspired by:
#
# https://gitea.com/gitea/act_runner/src/branch/main/examples/kubernetes/dind-docker.yaml
# https://github.com/pikatenor/infra/blob/dc281adf838c00f563e3aa9dd5e0b5bc585d9d2e/dream0/default/gitea-runner-dind.yml
#
---
apiVersion: apps/v1
kind: StatefulSet
metadata:

This file has been truncated. show original

my current working config.
I’m running this as a StatefulSet with 2 containers, so docker container is external and not privileged.
Also the Configmap is very important for all the setup, it’s also generated from gitea act_runner and adjusted for docker connectivity ouside the act container.

Be aware of the k8s secret that contains the token.
EDIT: also be aware of the PVC needed for the act_runner to store config.

i’ve chosen this setup also to be able to buildx with qemu, which you’ll find in the gist in the action yaml.

I think it’s straight forward, let me know if you have any issues, i’ll try to answer.

mariusrugan · May 30, 2024, 2:53pm

Action with this setup, proof of correct setup.

mariusrugan · May 30, 2024, 2:55pm

Ofc it can be taken forward, personally i’d add everything that is done in the action (installing qemu) in one image and use that as a build harness.
like as

EDIT:
@aladante i’m curious about the Talos part, let me know if the contraption with 2 containers works, my runner is running in a Debian, x86_64, running in a k3s cluster.

MichaelC · May 31, 2024, 3:04am

Thanks @mariusrugan - I will go through your post!

aladante · May 31, 2024, 9:49am

Thank you the explanation has been very informative. I am already really glad with the config. Didn’t knew that that was an option.
I have the same setup as you with a docker daemon running next to the act-runner. Going to try with config.

And also trying to adjust the docker_hosts in the pipeline.
I will keep you update on succes or failure.
Appricitate the pointers!

aladante · May 31, 2024, 10:46am

IT WORKKSS.

Massive thanks!
Everything is building and it works.

Much appriacted

chevdor · August 10, 2024, 4:52pm

Thanks a lot for sharing @mariusrugan.
The sample is not complete since you need to bring your own secret and volume but it is fair to assume that people who already tried the setup already have those. Your example is very helpful, I wish I would have found that in the doc instead of the example that does not work…

I think you should add the namespace to the stateful set template.

...
    reloader.stakater.com/auto: "true"
spec:
  replicas: 1
  selector:
    matchLabels:
      app: gitea-act-runner-dind
  serviceName: gitea-act-runner-dind
  template:
    metadata:
      namespace: gitea-runners                  # <<<<< HERE (my namespace differs)
      labels:
        app: gitea-act-runner-dind
    spec:
...

Another suggestion I would make it to fix your data volumes so that running more than one replica also works.
For that, remove:

# - name: runner-data
#   persistentVolumeClaim:
#     claimName: act-runner-vol

Add something like:

  volumeClaimTemplates:
    - apiVersion: v1
      kind: PersistentVolumeClaim
      metadata:
        name: gitea-runner-storage
        namespace: gitea-runners
      spec:
        storageClassName: nfs-provisioner-ssd
        accessModes: [ "ReadWriteOnce" ]
        resources:
          requests:
            storage: "1Gi"

and ensure the volumeMounts include:

          volumeMounts:
            - name: docker-certs
              mountPath: /certs
            - name: gitea-runner-storage             # <<< THAT
              mountPath: /data                       # <<< THAT
          ...

It will create a Persistence Volume Claim per replica and bind it to the pod started by the replicaset.

With that, setting replica: 2 for instance, will give you 2 runners:

Finally, I would suggest a better Name for the runners.
Using the node name is not great: you may have several runners running on a node, and they all will show up with the same name.

Instead, you can use:

- name: GITEA_RUNNER_NAME
              valueFrom:
                fieldRef:
                  fieldPath: status.podIP

which gives a better way to uniquely id those runners (I can’t add another image so here is the text version…):

Status	ID	Name	Version	Type	Labels	Last Online Time	Edit
Idle	13	10.244.2.250 ...
Idle	14	10.244.1.243 ...

MichaelC · November 19, 2024, 7:28pm

@mariusrugan - My original approach to create an image that included a lot of the setup. Thanks so much for a complete and well written example!

One question off the top is how did I end up with port 2375 and you use port 2376?

I’m also curious about whether its tcp://docker or tcp://localhost as I see this gist example use ‘docker’ while act_runner/dind-docker.yaml at main - act_runner - Gitea: Git with a cup of tea uses ‘localhost’. The ‘docker’ hostname doesn’t work for me.

The second question, is I have setup this configuration inside k8s… and I have a script that calls the following…

docker buildx create

Creating buildx using 
ERROR: failed to initialize builder vigilant_snyder (vigilant_snyder0): Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?

docker run --privileged --rm tonistiigi/binfmt --install all

and results in this generic socket error…

Login Succeeded

docker: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?.
See 'docker run --help'.

What I dont understand is if this is running on the DinD sidecar there is a socket present and not sure why this script (running in the action) can’t see it.

Subsequent calls will be using buildx to build containers and push them to a registry.

The entire workflow works fine where the act-runner runs on a non-k8s machine. So I’m wondering why this workflow doesn’t find a docker environment it needs.

mariusrugan · November 19, 2024, 8:48pm

hi,

#1. 2375 vs. 2376

It is conventional to use port 2375 for un-encrypted, and port 2376 for encrypted communication with the daemon.

from

#2. I just kicked the action and runs just fine on my k3s environment, am checking diffs nothing in particular outstanding, just
image: docker:26.1.4-dind for DInD.

I have an issue with pushing the image due to the token, in the same Gitea instance - but all the rest works fine, proof in screenshot.

Is the daemon - running in your sts ?

EDIT:

I restarted the container and it pulled the latest “gitea/act_runner:nightly”
and it seems the DinD container is getting into a CrashLoopBackOff

This could be your case too ?

mariusrugan · November 19, 2024, 8:57pm

I understand the confusion related to, the naming, is well founded, e.g. in the action why we refer to “docker”

and i think it has to do with Gitea self-hosted runner · GitHub

which has the following explanations:

tldr;
we set that option to for the act_runner process to communicate with localhost in the end via ENV vars.

However,

what throws me off in your case is daemon at unix:///var/run/docker.sock which is not tcp:// so something is off with your configs.

mariusrugan · November 19, 2024, 9:08pm

I also see that gitea/act_runner:nightly is not starting,
due to Gitea self-hosted runner · GitHub
which is at fault due to tini:

MichaelC · November 19, 2024, 9:08pm

Here is the (ansible) code that provisions the runner…

---
- name: "Create act-runner k8s namespace"
  kubernetes.core.k8s:
    kubeconfig: /home/{{ansible_user}}/.kube/config
    name: act-runner
    api_version: v1
    kind: Namespace
    state: present

- name: "Create act-runner storage claim"
  kubernetes.core.k8s:
    kubeconfig: /home/{{ansible_user}}/.kube/config
    state: present
    definition: 
      kind: PersistentVolumeClaim
      apiVersion: v1
      metadata:
        name: act-runner-vol
        namespace: act-runner
      spec:
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 200Mi
        storageClassName: longhorn

- name: "Create act-runner secret (registration token)"
  kubernetes.core.k8s:
    kubeconfig: /home/{{ansible_user}}/.kube/config
    state: present
    definition: 
      apiVersion: v1
      data:
        token: "{{ act_runner_token | b64encode }}"
      kind: Secret
      metadata:
        name: runner-secret
        namespace: act-runner
      type: Opaque

- name: "Create act runner configmap"
  kubernetes.core.k8s:
    kubeconfig: /home/{{ansible_user}}/.kube/config
    state: present
    definition: 
      apiVersion: v1
      kind: ConfigMap
      metadata:
        name: act-runner-config
        namespace: act-runner
        labels:
          app: act-runner
      data:
        config.yaml: |
          log:
            level: info

          runner:
            file: .runner
            capacity: 4
            env_file: .env
            timeout: 1h
            insecure: false
            fetch_timeout: 5s
            fetch_interval: 10s
            labels: [
              "bookworm:docker://node:21-bookworm",
              "bookworm-ham:docker://registry.dtmc.ca/gitea/bookworm-ham"
            ]

          cache:
            enabled: true
            dir: ""
            host: ""
            port: 0
            external_server: ""

          container:
            network: ""
            privileged: false
            options:
            workdir_parent:
            valid_volumes: []
            docker_host: ""
            force_pull: false

          host:
            workdir_parent:
          

- name: "Create act-runner deployment"
  kubernetes.core.k8s:
    kubeconfig: /home/{{ansible_user}}/.kube/config
    state: present
    definition: 
      apiVersion: apps/v1
      kind: Deployment
      metadata:
        labels:
          app: act-runner
        name: act-runner
        namespace: act-runner
      spec:
        replicas: 1
        selector:
          matchLabels:
            app: act-runner
        strategy: {}
        template:
          metadata:
            creationTimestamp: null
            labels:
              app: act-runner
            name: act-runner
            namespace: act-runner
          spec:
            restartPolicy: Always
            volumes:
            - name: docker-certs
              emptyDir: {}
            - name: runner-data
              persistentVolumeClaim:
                claimName: act-runner-vol
            - name: act-runner-config
              configMap:
                name: act-runner-config
            containers:
            - name: runner
              image: gitea/act_runner:latest
              command: ["sh", "-c", "while ! nc -z localhost 2376 </dev/null; do echo 'waiting for docker daemon...'; sleep 5; done; /sbin/tini -- /opt/act/run.sh"]
              env:
              - name: DOCKER_HOST
                value: tcp://localhost:2376
              - name: DOCKER_CERT_PATH
                value: /certs/client
              - name: DOCKER_TLS_VERIFY
                value: "1"
              - name: GITEA_INSTANCE_URL
                value: https://gitea.dtmc.ca
              - name: GITEA_RUNNER_REGISTRATION_TOKEN
                valueFrom:
                  secretKeyRef:
                    name: runner-secret
                    key: token
              - name: CONFIG_FILE
                value: /act-config/config.yaml
              volumeMounts:
              - name: docker-certs
                mountPath: /certs
              - name: runner-data
                mountPath: /data
              - name: act-runner-config
                mountPath: /act-config
            - name: daemon
              image: docker:23.0.6-dind
              env:
              - name: DOCKER_TLS_CERTDIR
                value: /certs
              securityContext:
                privileged: true
              volumeMounts:
              - name: docker-certs
                mountPath: /certs

It successfully provisions the act-runner and daemon pods in k3s cluster and uses a config that uses some custom “bookworm-ham” container to run jobs in which basically has docker installed and a bunch of toolchain stuff…

FROM node:22-bookworm AS base

# This is a single-layer addition to bookworm base image to improve CICD loadbuild times
# on act-runners
ENV UBUNTU_RELEASE=jammy
ENV DOCKER_COMPOSE_RELEASE=v2.29.1
ENV TEA_VERSION=v0.9.2

# Actions and scripts can use this ENV variable to skip loadbuid/toolchain installation
# steps if they know it is included in this base image.
ENV SKIP_TOOLCHAIN=1 

RUN apt-get update && apt-get dist-upgrade -y && \
    apt install -y --no-install-recommends ca-certificates software-properties-common openssh-client curl gnupg jq rsync && \
    mkdir -p /etc/apt/keyrings && \
    curl -fsSL https://download.docker.com/linux/ubuntu/gpg |  gpg --dearmor -o /etc/apt/keyrings/docker.gpg && \
    echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu ${UBUNTU_RELEASE} stable" |  tee /etc/apt/sources.list.d/docker.list > /dev/null && \
    apt update -y && \
    apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin && \
    apt-get install -y python3-launchpadlib && \
    #add-apt-repository ppa:longsleep/golang-backports -y && \
    add-apt-repository "deb http://ppa.launchpad.net/longsleep/golang-backports/ubuntu ${UBUNTU_RELEASE} main" && \
    apt-get update && \
    apt-get install -y golang && \
    wget -O- "https://keyserver.ubuntu.com/pks/lookup?fingerprint=on&op=get&search=0x6125E2A8C77F2818FB7BD15B93C4A3FD7BB9C367" | gpg --no-tty --dearmour -o $PWDDIR/ansible-archive-keyring.gpg && \
    echo "deb [trusted=yes] http://ppa.launchpad.net/ansible/ansible/ubuntu ${UBUNTU_RELEASE} main" | tee /etc/apt/sources.list.d/ansible.list && \
    apt update && \
    apt install -y ansible && \
    curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 && \
    chmod 700 get_helm.sh && \
    ./get_helm.sh && \
    git clone https://gitea.com/gitea/tea.git && \
    cd tea  && \
    git checkout ${TEA_VERSION}  && \
    go mod vendor && \
    make  && \
    make install

This toolchain image is built because on non-k8s environment, I have installed docker into the “bookworm-ham” container.

I did this because this fails to build successfully within a gitea act-runner.

I need this “docker setup” on a non-k8s that runs the act-runner to be able to run docker buildx scripts.

This may be where I’m getting my wires crossed… and somehow messing up the overall k8s DinD configuration?

mariusrugan · November 19, 2024, 9:13pm

this the options i’m talking about:

options: "--add-host=docker:host-gateway -v /certs:/certs"

mariusrugan · November 19, 2024, 9:15pm

I’d go back one step and apply by hand the configmap i’m using

gist.github.com

https://gist.github.com/mariusrugan/911f5da923c93f3c795d3e84bed9e256#file-gitea-config-cm-yaml

build-image.yaml

# Inspired by:
# https://blog.gitea.com/creating-go-actions/
# https://gitea.com/Zettat123/test-simple-go-action/src/branch/main/.gitea/workflows/call-username.yml
# https://gitea.com/actions/release-action
# https://gitea.com/gitea/runner-images
# https://github.com/vegardit/docker-gitea-act-runner
# https://github.com/catthehacker/docker_images
#
---
# yaml-language-server: $schema=https://json.schemastore.org/github-workflow.json

This file has been truncated. show original

gitea-config-cm.yaml

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: config
  namespace: actions-runner-system
  annotations:
    reloader.stakater.com/auto: "true"
data:
  config.yaml: |-

This file has been truncated. show original

gitea-sts.yaml


# Inspired by:
#
# https://gitea.com/gitea/act_runner/src/branch/main/examples/kubernetes/dind-docker.yaml
# https://github.com/pikatenor/infra/blob/dc281adf838c00f563e3aa9dd5e0b5bc585d9d2e/dream0/default/gitea-runner-dind.yml
#
---
apiVersion: apps/v1
kind: StatefulSet
metadata:

This file has been truncated. show original

and then Ansible it

Topic		Replies	Views
Creating Docker Images Inside Kubernetes Act Runner? Install/Maintain/Configure	2	884	November 18, 2024
Building container images inside a containerized runner Actions	1	212	March 5, 2025
Docker rootless: Cannot connect to the Docker daemon Actions	0	133	March 17, 2025
Can I build docker images if the action runner is a docker instance Actions	2	776	December 12, 2024
Security concerns about act runners Actions	3	1302	October 30, 2023

Act Runner in k8s fail to connect to docker daemon

Related topics