Bootstrap cluster with FluxCD

That is the next iteration of the "5 AM Club" Kubernetes migration. As you can remember from other entries published in this series, I started playing with Kubernetes daily. Not on a daily basis, but literally every single day. To be honest, I'm pretty happy with the results. However, my plan has one challenge, that requires to be solved. Build time was long ~15 minutes every day, and sometimes ESO operator faced an issue regarding using kustomize for Helm-based deployment. And just due to random constraints, I wanted to use one tool for bootstrapping a cluster.
That is why I started with the layout presented last time. So
logical split per operator with the base/overlay
sub-group.
Then I thought, why not just use some standard solution for that? The first idea was to extend the usage of Argo for my infrastructure. But... It requires setting ESO, Doppler secret, and Argocd installed. Then I can reconfigure apps from the git level.
The challenge was, to make it easier, faster, and more standardized then it was currently.
Starting from beginning
What Flux is? And what is not?
Flux is a tool for keeping Kubernetes clusters in sync with sources of configuration (like Git repositories), and automating updates to the configuration when there is new code to deploy.
We can use Flux to keep the configuration of our whole cluster in sync, with the Git repository. The funny thing is that we can configure both apps and infra with the usage of Flux.
However, managing apps with Flux in my opinion is not the easiest and most comfortable
solution. Especially, if we're changing versions quite often and we would
like to have
at least a few dependencies between apps and infra. For example, my Immich instance needs
csi-driver-smb
, which on the other hand requires external-secret-operator
and
external-secret-secret
(the actual link between in-cluster secret and ESO ClusterSecretStore
).
So, every new relese,
needs to be built and checks that all kustomizations are in place.
Then actually deploy a new version. Very long process, also ArgoCD UI
is just better, easier to use, and definitely more
user-friendly - at least in my opinion.
Repository structure
So after a few initial rounds it's ended in the following state:
1.
2├── README.md
3├── clusters
4│ └── cluster0
5│ └── flux-system
6│ ├── gotk-components.yaml
7│ ├── gotk-sync.yaml
8│ ├── infrastructure.yaml
9│ └── kustomization.yaml
10└── infrastructure
11 └── controllers
12 ├── argocd-operator
13 │ ├── configmap-patch.yaml
14 │ ├── kustomization.yaml
15 │ ├── namespace.yaml
16 │ └── patch-argocd-server-annotations.yaml
17 ├── argocd-operator-apps
18 │ ├── applications.yaml
19 │ ├── kustomization.yaml
20 │ ├── projects.yaml
21 │ └── repositories.yaml
22 ├── csi-driver-smb
23 │ ├── csi-driver-smb.yaml
24 │ └── kustomization.yaml
25 ├── external-secrets
26 │ ├── external-secrets-operator.yaml
27 │ └── kustomization.yaml
28 ├── external-secrets-secret
29 │ ├── cluster-secret-store.yaml
30 │ └── kustomization.yaml
31 ├── tailscale-operator
32 │ ├── kustomization.yaml
33 │ └── tailscale-operator.yaml
34 ├── tailscale-operator-secrets
35 │ ├── kustomization.yaml
36 │ └── tailscale-operator-exteral-secret.yaml
37 └── traefik
38 ├── kustomization.yaml
39 └── traefik-ext-conf.yaml
Now we need some explanation, right?
Flux configuration
1clusters
2└── cluster0
3 └── flux-system
4 ├── gotk-components.yaml
5 ├── gotk-sync.yaml
6 ├── infrastructure.yaml
7 └── kustomization.yaml
Here we have the configuration of our cluster, which is an awesome idea.
In one repository we can have configurations for multiple clusters,
based on provider, environment, or location, and manage them in a very
simple way.
Then we have regular flux-system files, so flux-system/gotk-components.yaml
and flux-system/gotk-sync.yaml
.
Next, let's talk about my simple Kustomization
file
1apiVersion: kustomize.config.k8s.io/v1beta1
2kind: Kustomization
3resources:
4 - gotk-components.yaml
5 - gotk-sync.yaml
6 - infrastructure.yaml
This just tells Flux, that after bootstrapping itself, install
manifests based on infrastructure.yaml
file. So let's take a look
at the most crucial part of the config.
1---
2apiVersion: kustomize.toolkit.fluxcd.io/v1
3kind: Kustomization
4metadata:
5 name: csi-driver-smb
6 namespace: flux-system
7spec:
8 interval: 1h
9 retryInterval: 1m
10 timeout: 5m
11 sourceRef:
12 kind: GitRepository
13 name: flux-system
14 path: ./infrastructure/controllers/csi-driver-smb
15 prune: true
16 wait: true
17---
18apiVersion: kustomize.toolkit.fluxcd.io/v1
19kind: Kustomization
20metadata:
21 name: external-secrets
22 namespace: flux-system
23spec:
24 interval: 1h
25 retryInterval: 1m
26 timeout: 5m
27 sourceRef:
28 kind: GitRepository
29 name: flux-system
30 path: ./infrastructure/controllers/external-secrets
31 prune: true
32 wait: true
33---
34apiVersion: kustomize.toolkit.fluxcd.io/v1
35kind: Kustomization
36metadata:
37 name: external-secrets-secret
38 namespace: flux-system
39spec:
40 interval: 1h
41 retryInterval: 1m
42 timeout: 5m
43 sourceRef:
44 kind: GitRepository
45 name: flux-system
46 path: ./infrastructure/controllers/external-secrets-secret
47 dependsOn:
48 - name: external-secrets
49 prune: true
50 wait: true
51---
52apiVersion: kustomize.toolkit.fluxcd.io/v1
53kind: Kustomization
54metadata:
55 name: tailscale-operator
56 namespace: flux-system
57spec:
58 interval: 1h
59 retryInterval: 1m
60 timeout: 5m
61 sourceRef:
62 kind: GitRepository
63 name: flux-system
64 path: ./infrastructure/controllers/tailscale-operator
65 dependsOn:
66 - name: external-secrets-secret
67 prune: true
68 wait: true
69---
70apiVersion: kustomize.toolkit.fluxcd.io/v1
71kind: Kustomization
72metadata:
73 name: tailscale-operator-secret
74 namespace: flux-system
75spec:
76 interval: 1h
77 retryInterval: 1m
78 timeout: 5m
79 sourceRef:
80 kind: GitRepository
81 name: flux-system
82 path: ./infrastructure/controllers/tailscale-operator-secrets
83 dependsOn:
84 - name: external-secrets
85 prune: true
86 wait: true
87---
88apiVersion: kustomize.toolkit.fluxcd.io/v1
89kind: Kustomization
90metadata:
91 name: traefik
92 namespace: flux-system
93spec:
94 interval: 1h
95 retryInterval: 1m
96 timeout: 5m
97 sourceRef:
98 kind: GitRepository
99 name: flux-system
100 path: ./infrastructure/controllers/traefik
101 prune: true
102 wait: true
103---
104apiVersion: kustomize.toolkit.fluxcd.io/v1
105kind: Kustomization
106metadata:
107 name: argocd-operator
108 namespace: flux-system
109spec:
110 interval: 1h
111 retryInterval: 1m
112 timeout: 5m
113 sourceRef:
114 kind: GitRepository
115 name: flux-system
116 path: ./infrastructure/controllers/argocd-operator
117 dependsOn:
118 - name: tailscale-operator
119 prune: true
120 wait: true
121---
122apiVersion: kustomize.toolkit.fluxcd.io/v1
123kind: Kustomization
124metadata:
125 name: argocd-apps
126 namespace: flux-system
127spec:
128 interval: 1h
129 retryInterval: 1m
130 timeout: 5m
131 sourceRef:
132 kind: GitRepository
133 name: flux-system
134 path: ./infrastructure/controllers/argocd-operator-apps
135 dependsOn:
136 - name: argocd-operator
137 prune: true
138 wait: true
As you can see I heavily rely on dependsOn
command. That is due
Flux and Kubernetes archtecture design. When we're applying a Kustomization,
we do not control the order of creating the resources. Of course, if we
put in one file, service, ingress, and deployment, Kubernetes will know
how to handle that. The problem is when we deploy an application that has
dependencies on each other, that is not what Kubernetes understands.
So we need to specify it directly.
In my example csi-driver-smb
, external-secrets
, and traefik
can
be installed in parallel, but then we have some relations.
At the end, we're installing Argo's app-of-apps
, which requires
in general all previous components, besides traefik
- I'm
routing my traffic through Tailscale there, not via the Internet.
Now you can think:
App-of-apps? You said infra only!
That is right, my logic was quite complex, to be honest here. Do you remember, when I wrote about use-case? Bootstraping the whole cluster so that I will be able to work with it fast. Apps are part of the overall cluster at the end. So my app-of-apps definition was very simple:
1infrastructure/controllers/argocd-operator-apps
2├── applications.yaml
3├── kustomization.yaml
4├── projects.yaml
5└── repositories.yaml
Let's dive into it one, one by one.
applications.yaml
1--- 2apiVersion: argoproj.io/v1alpha1 3kind: Application 4metadata: 5 name: self-sync 6 namespace: argocd 7spec: 8 syncPolicy: 9 automated: 10 selfHeal: true 11 project: non-core-namespaces 12 source: 13 repoURL: https://codeberg.org/3sky/argocd-for-home 14 targetRevision: HEAD 15 path: app-of-apps 16 destination: 17 server: https://kubernetes.default.svc 18 namespace: argocd
That is a very simple definition of my main
Application
that controls all apps running on my cluster. Nothing very special here, I was experimenting withcodeberg
, but let's talk about that later.kustomization.yaml
1--- 2apiVersion: kustomize.config.k8s.io/v1beta1 3kind: Kustomization 4 5namespace: argocd 6resources: 7 - repositories.yaml 8 - projects.yaml 9 - applications.yaml
Kustomization
is simple, just ordered in the logic path.Repository -> Projects -> Application
projects.yaml
1--- 2apiVersion: argoproj.io/v1alpha1 3kind: AppProject 4metadata: 5 name: non-core-namespaces 6 namespace: argocd 7spec: 8 description: Allow argo deploy everywhere 9 sourceRepos: 10 - 'https://codeberg.org/3sky/argocd-for-home' 11 destinations: 12 - namespace: '*' 13 server: https://kubernetes.default.svc 14 namespaceResourceWhitelist: 15 - group: '*' 16 kind: '*' 17 clusterResourceWhitelist: 18 - group: '*' 19 kind: '*'
As I'm creating
namespaces
as part of my application setup, permission granted to Argo's Projects are very wide. In this case, I just trust the GitOps process.repositories.yaml
1apiVersion: external-secrets.io/v1beta1 2kind: ExternalSecret 3metadata: 4 name: gitops-with-argo-secret 5 namespace: argocd 6spec: 7 refreshInterval: 6h 8 secretStoreRef: 9 name: doppler-auth-argocd 10 kind: ClusterSecretStore 11 target: 12 name: gitops-with-argo 13 creationPolicy: Owner 14 template: 15 type: Opaque 16 metadata: 17 labels: 18 argocd.argoproj.io/secret-type: repository 19 data: 20 type: git 21 url: https://codeberg.org/3sky/argocd-for-home 22 username: 3sky 23 password: "{{ .password }}" 24 data: 25 - secretKey: password 26 remoteRef: 27 key: CODEBERG_TOKEN
This is probably the most interesting part of the configuration. As we're unable to inject our password directly into
argocd.argoproj.io/secret-type: repository
, which is a regular Kubernetes Secret. We need to generate the whole object with ESO - details can be found here.
And that's it. Now let's talk about the actual bootstrap process.
Bootstraping environment
Spin up infrastructure with Terraform.
1terraform apply -var="local_ip=$(curl -s ifconfig.me)"
Installing K3S with local fork of k3s-ansible.
1ansible-playbook playbooks/site.yml -i inventory.yml
Load KUBECONFIG
1export KUBECONFIG=/home/kuba/.kube/config.hetzner-prod 2kubectl config use-context k3s-ansible
Doppler configuration (we need an initail secret somewhere)
1kubectl create namespace external-secrets 2kubectl create secret generic \ 3 -n external-secrets doppler-token-argocd \ 4 --from-literal dopplerToken=""
Bootstrap the cluster
1flux bootstrap github \ 2 --owner=3sky \ 3 --repository=flux-at-home \ 4 --branch=main \ 5 --path=./clusters/cluster0 \ 6 --personal
Where GitHub is supported natively (via GitHub Apps). Solutions like Codeberg needs a more direct method.
1flux bootstrap git \ 2 --url=ssh://[email protected]/3sky/flux-at-home.git \ 3 --branch=main \ 4 --path=./clusters/cluster0 \ 5 --private-key-file=/home/kuba/.ssh/id_ed25519_git
Get ArgoCD password if needed
1kubectl --namespace argocd \ 2 get secret argocd-initial-admin-secret \ 3 -o json | jq -r '.data.password' | base64 -d
Summary
With this set of commands, I'm able to set fresh clusters, with
Hetzner or whatever in literally 7 minutes. Starting from installing
k3s to having applications exposed via Cloudflare tunnel, or Tailscale
tailnet. To be honest I'm really satisfied with the result of this
exercise. Besides a long-running cluster for my self-hosted apps. I can
quite easily, fast, and with low costs test new versions of ArgoCD, or
smb-driver
operator, without harming my current setup! And that
is great, to be honest. Especially for self-hosting, where
it's good to have apps that are working in let's say
production mode
, even if there is only one user.