Mastodon

Bootstrap cluster with FluxCD

That is the next iteration of the "5 AM Club" Kubernetes migration. As you can remember from other entries published in this series, I started playing with Kubernetes daily. Not on a daily basis, but literally every single day. To be honest, I'm pretty happy with the results. However, my plan has one challenge, that requires to be solved. Build time was long ~15 minutes every day, and sometimes ESO operator faced an issue regarding using kustomize for Helm-based deployment. And just due to random constraints, I wanted to use one tool for bootstrapping a cluster.

That is why I started with the layout presented last time. So logical split per operator with the base/overlay sub-group.

Then I thought, why not just use some standard solution for that? The first idea was to extend the usage of Argo for my infrastructure. But... It requires setting ESO, Doppler secret, and Argocd installed. Then I can reconfigure apps from the git level.

The challenge was, to make it easier, faster, and more standardized then it was currently.

Starting from beginning

What Flux is? And what is not?

Flux is a tool for keeping Kubernetes clusters in sync with sources of configuration (like Git repositories), and automating updates to the configuration when there is new code to deploy.

We can use Flux to keep the configuration of our whole cluster in sync, with the Git repository. The funny thing is that we can configure both apps and infra with the usage of Flux.

However, managing apps with Flux in my opinion is not the easiest and most comfortable solution. Especially, if we're changing versions quite often and we would like to have at least a few dependencies between apps and infra. For example, my Immich instance needs csi-driver-smb, which on the other hand requires external-secret-operator and external-secret-secret (the actual link between in-cluster secret and ESO ClusterSecretStore). So, every new relese, needs to be built and checks that all kustomizations are in place. Then actually deploy a new version. Very long process, also ArgoCD UI is just better, easier to use, and definitely more user-friendly - at least in my opinion.

Repository structure

So after a few initial rounds it's ended in the following state:

 1.
 2├── README.md
 3├── clusters
 4│   └── cluster0
 5│       └── flux-system
 6│           ├── gotk-components.yaml
 7│           ├── gotk-sync.yaml
 8│           ├── infrastructure.yaml
 9│           └── kustomization.yaml
10└── infrastructure
11    └── controllers
12        ├── argocd-operator
13        │   ├── configmap-patch.yaml
14        │   ├── kustomization.yaml
15        │   ├── namespace.yaml
16        │   └── patch-argocd-server-annotations.yaml
17        ├── argocd-operator-apps
18        │   ├── applications.yaml
19        │   ├── kustomization.yaml
20        │   ├── projects.yaml
21        │   └── repositories.yaml
22        ├── csi-driver-smb
23        │   ├── csi-driver-smb.yaml
24        │   └── kustomization.yaml
25        ├── external-secrets
26        │   ├── external-secrets-operator.yaml
27        │   └── kustomization.yaml
28        ├── external-secrets-secret
29        │   ├── cluster-secret-store.yaml
30        │   └── kustomization.yaml
31        ├── tailscale-operator
32        │   ├── kustomization.yaml
33        │   └── tailscale-operator.yaml
34        ├── tailscale-operator-secrets
35        │   ├── kustomization.yaml
36        │   └── tailscale-operator-exteral-secret.yaml
37        └── traefik
38            ├── kustomization.yaml
39            └── traefik-ext-conf.yaml

Now we need some explanation, right?

Flux configuration

1clusters
2└── cluster0
3    └── flux-system
4        ├── gotk-components.yaml
5        ├── gotk-sync.yaml
6        ├── infrastructure.yaml
7        └── kustomization.yaml

Here we have the configuration of our cluster, which is an awesome idea. In one repository we can have configurations for multiple clusters, based on provider, environment, or location, and manage them in a very simple way. Then we have regular flux-system files, so flux-system/gotk-components.yaml and flux-system/gotk-sync.yaml. Next, let's talk about my simple Kustomization file

1apiVersion: kustomize.config.k8s.io/v1beta1
2kind: Kustomization
3resources:
4  - gotk-components.yaml
5  - gotk-sync.yaml
6  - infrastructure.yaml

This just tells Flux, that after bootstrapping itself, install manifests based on infrastructure.yaml file. So let's take a look at the most crucial part of the config.

  1---
  2apiVersion: kustomize.toolkit.fluxcd.io/v1
  3kind: Kustomization
  4metadata:
  5  name: csi-driver-smb
  6  namespace: flux-system
  7spec:
  8  interval: 1h
  9  retryInterval: 1m
 10  timeout: 5m
 11  sourceRef:
 12    kind: GitRepository
 13    name: flux-system
 14  path: ./infrastructure/controllers/csi-driver-smb
 15  prune: true
 16  wait: true
 17---
 18apiVersion: kustomize.toolkit.fluxcd.io/v1
 19kind: Kustomization
 20metadata:
 21  name: external-secrets
 22  namespace: flux-system
 23spec:
 24  interval: 1h
 25  retryInterval: 1m
 26  timeout: 5m
 27  sourceRef:
 28    kind: GitRepository
 29    name: flux-system
 30  path: ./infrastructure/controllers/external-secrets
 31  prune: true
 32  wait: true
 33---
 34apiVersion: kustomize.toolkit.fluxcd.io/v1
 35kind: Kustomization
 36metadata:
 37  name: external-secrets-secret
 38  namespace: flux-system
 39spec:
 40  interval: 1h
 41  retryInterval: 1m
 42  timeout: 5m
 43  sourceRef:
 44    kind: GitRepository
 45    name: flux-system
 46  path: ./infrastructure/controllers/external-secrets-secret
 47  dependsOn:
 48    - name: external-secrets
 49  prune: true
 50  wait: true
 51---
 52apiVersion: kustomize.toolkit.fluxcd.io/v1
 53kind: Kustomization
 54metadata:
 55  name: tailscale-operator
 56  namespace: flux-system
 57spec:
 58  interval: 1h
 59  retryInterval: 1m
 60  timeout: 5m
 61  sourceRef:
 62    kind: GitRepository
 63    name: flux-system
 64  path: ./infrastructure/controllers/tailscale-operator
 65  dependsOn:
 66    - name: external-secrets-secret
 67  prune: true
 68  wait: true
 69---
 70apiVersion: kustomize.toolkit.fluxcd.io/v1
 71kind: Kustomization
 72metadata:
 73  name: tailscale-operator-secret
 74  namespace: flux-system
 75spec:
 76  interval: 1h
 77  retryInterval: 1m
 78  timeout: 5m
 79  sourceRef:
 80    kind: GitRepository
 81    name: flux-system
 82  path: ./infrastructure/controllers/tailscale-operator-secrets
 83  dependsOn:
 84    - name: external-secrets
 85  prune: true
 86  wait: true
 87---
 88apiVersion: kustomize.toolkit.fluxcd.io/v1
 89kind: Kustomization
 90metadata:
 91  name: traefik
 92  namespace: flux-system
 93spec:
 94  interval: 1h
 95  retryInterval: 1m
 96  timeout: 5m
 97  sourceRef:
 98    kind: GitRepository
 99    name: flux-system
100  path: ./infrastructure/controllers/traefik
101  prune: true
102  wait: true
103---
104apiVersion: kustomize.toolkit.fluxcd.io/v1
105kind: Kustomization
106metadata:
107  name: argocd-operator
108  namespace: flux-system
109spec:
110  interval: 1h
111  retryInterval: 1m
112  timeout: 5m
113  sourceRef:
114    kind: GitRepository
115    name: flux-system
116  path: ./infrastructure/controllers/argocd-operator
117  dependsOn:
118    - name: tailscale-operator
119  prune: true
120  wait: true
121---
122apiVersion: kustomize.toolkit.fluxcd.io/v1
123kind: Kustomization
124metadata:
125  name: argocd-apps
126  namespace: flux-system
127spec:
128  interval: 1h
129  retryInterval: 1m
130  timeout: 5m
131  sourceRef:
132    kind: GitRepository
133    name: flux-system
134  path: ./infrastructure/controllers/argocd-operator-apps
135  dependsOn:
136    - name: argocd-operator
137  prune: true
138  wait: true

As you can see I heavily rely on dependsOn command. That is due Flux and Kubernetes archtecture design. When we're applying a Kustomization, we do not control the order of creating the resources. Of course, if we put in one file, service, ingress, and deployment, Kubernetes will know how to handle that. The problem is when we deploy an application that has dependencies on each other, that is not what Kubernetes understands. So we need to specify it directly. In my example csi-driver-smb, external-secrets, and traefik can be installed in parallel, but then we have some relations. At the end, we're installing Argo's app-of-apps, which requires in general all previous components, besides traefik - I'm routing my traffic through Tailscale there, not via the Internet.

Now you can think:

App-of-apps? You said infra only!

That is right, my logic was quite complex, to be honest here. Do you remember, when I wrote about use-case? Bootstraping the whole cluster so that I will be able to work with it fast. Apps are part of the overall cluster at the end. So my app-of-apps definition was very simple:

1infrastructure/controllers/argocd-operator-apps
2├── applications.yaml
3├── kustomization.yaml
4├── projects.yaml
5└── repositories.yaml

Let's dive into it one, one by one.

  1. applications.yaml

     1---
     2apiVersion: argoproj.io/v1alpha1
     3kind: Application
     4metadata:
     5  name: self-sync
     6  namespace: argocd
     7spec:
     8  syncPolicy:
     9    automated:
    10      selfHeal: true
    11  project: non-core-namespaces
    12  source:
    13    repoURL: https://codeberg.org/3sky/argocd-for-home
    14    targetRevision: HEAD
    15    path: app-of-apps
    16  destination:
    17    server: https://kubernetes.default.svc
    18    namespace: argocd
    

    That is a very simple definition of my main Application that controls all apps running on my cluster. Nothing very special here, I was experimenting with codeberg, but let's talk about that later.

  2. kustomization.yaml

    1---
    2apiVersion: kustomize.config.k8s.io/v1beta1
    3kind: Kustomization
    4
    5namespace: argocd
    6resources:
    7  - repositories.yaml
    8  - projects.yaml
    9  - applications.yaml
    

    Kustomization is simple, just ordered in the logic path.

    Repository -> Projects -> Application

  3. projects.yaml

     1---
     2apiVersion: argoproj.io/v1alpha1
     3kind: AppProject
     4metadata:
     5  name: non-core-namespaces
     6  namespace: argocd
     7spec:
     8  description: Allow argo deploy everywhere
     9  sourceRepos:
    10    - 'https://codeberg.org/3sky/argocd-for-home'
    11  destinations:
    12    - namespace: '*'
    13      server: https://kubernetes.default.svc
    14  namespaceResourceWhitelist:
    15    - group: '*'
    16      kind: '*'
    17  clusterResourceWhitelist:
    18    - group: '*'
    19      kind: '*'
    

    As I'm creating namespaces as part of my application setup, permission granted to Argo's Projects are very wide. In this case, I just trust the GitOps process.

  4. repositories.yaml

     1apiVersion: external-secrets.io/v1beta1
     2kind: ExternalSecret
     3metadata:
     4  name: gitops-with-argo-secret
     5  namespace: argocd
     6spec:
     7  refreshInterval: 6h
     8  secretStoreRef:
     9    name: doppler-auth-argocd
    10    kind: ClusterSecretStore
    11  target:
    12    name: gitops-with-argo
    13    creationPolicy: Owner
    14    template:
    15      type: Opaque
    16      metadata:
    17        labels:
    18          argocd.argoproj.io/secret-type: repository
    19      data:
    20        type: git
    21        url: https://codeberg.org/3sky/argocd-for-home
    22        username: 3sky
    23        password: "{{ .password }}"
    24  data:
    25    - secretKey: password
    26      remoteRef:
    27        key: CODEBERG_TOKEN
    

    This is probably the most interesting part of the configuration. As we're unable to inject our password directly into argocd.argoproj.io/secret-type: repository, which is a regular Kubernetes Secret. We need to generate the whole object with ESO - details can be found here.

And that's it. Now let's talk about the actual bootstrap process.

Bootstraping environment

  1. Spin up infrastructure with Terraform.

    1terraform apply -var="local_ip=$(curl -s ifconfig.me)"
    
  2. Installing K3S with local fork of k3s-ansible.

    1ansible-playbook playbooks/site.yml -i inventory.yml
    
  3. Load KUBECONFIG

    1export KUBECONFIG=/home/kuba/.kube/config.hetzner-prod
    2kubectl config use-context k3s-ansible
    
  4. Doppler configuration (we need an initail secret somewhere)

    1kubectl create namespace external-secrets
    2kubectl create secret generic \
    3  -n external-secrets doppler-token-argocd \
    4  --from-literal dopplerToken=""
    
  5. Bootstrap the cluster

    1flux bootstrap github \
    2  --owner=3sky \
    3  --repository=flux-at-home \
    4  --branch=main \
    5  --path=./clusters/cluster0 \
    6  --personal
    

    Where GitHub is supported natively (via GitHub Apps). Solutions like Codeberg needs a more direct method.

    1flux bootstrap git \
    2  --url=ssh://[email protected]/3sky/flux-at-home.git \
    3  --branch=main   \
    4  --path=./clusters/cluster0   \
    5  --private-key-file=/home/kuba/.ssh/id_ed25519_git
    
  6. Get ArgoCD password if needed

    1kubectl --namespace argocd \
    2  get secret argocd-initial-admin-secret \
    3  -o json | jq -r '.data.password' | base64 -d
    

Summary

With this set of commands, I'm able to set fresh clusters, with Hetzner or whatever in literally 7 minutes. Starting from installing k3s to having applications exposed via Cloudflare tunnel, or Tailscale tailnet. To be honest I'm really satisfied with the result of this exercise. Besides a long-running cluster for my self-hosted apps. I can quite easily, fast, and with low costs test new versions of ArgoCD, or smb-driver operator, without harming my current setup! And that is great, to be honest. Especially for self-hosting, where it's good to have apps that are working in let's say production mode, even if there is only one user.