Automating Kubernetes Cluster Updates with Ansible and AWX
Automating Kubernetes Cluster Updates with Ansible and AWX
For initial cluster deployment, I recommend using Techno Tim’s k3s-ansible playbook. This playbook makes it easy to deploy k3s cluster on Proxmox VMs in your homelab. The playbook handles all the complexity of setting up a k3s cluster, letting you focus on running workloads rather than infrastructure setup.
Keeping a kubernetes cluster up-to-date can be challenging, especially when you need to maintain high availability and handle components like kube-vip. This guide will show you how to automate cluster updates using Ansible and AWX.
The Challenge
Updating a kubernetes cluster requires careful orchestration to:
- Maintain cluster availability during updates
- Handle both worker and server nodes differently
- Safely migrate critical components like kube-vip
- Drain nodes before updates
- Apply system updates
- Restore nodes to service
The Solution: An Ansible Playbook
This solution uses an Ansible playbook that handles these requirements with two main plays: one for worker nodes and one for server nodes.
Worker Node Updates
The worker node play handles updating k3s worker nodes one at a time:
- Drain the node to ensure no pods are running on it
- Stop the k3s agent service
- Apply system updates
- Restart the k3s agent service
- Uncordon the node to allow it to join the cluster again
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
- name: Update K3s Worker Nodes Serially
hosts: k3s_workers
serial: 1
become: true
vars:
k3s_server: ""
k3s_agent_service: "k3s-agent"
tasks:
- name: Check if workers should be updated
meta: end_play
when: not (update_workers | default(true) | bool)
- name: Get node name from kubectl
shell: kubectl get nodes -o name | grep -i
delegate_to: ""
become: true
register: node_name
changed_when: false
- name: Drain the worker node
command: >
kubectl drain
--ignore-daemonsets
--delete-emptydir-data
delegate_to: ""
become: true
- name: Stop K3s agent
systemd:
name: k3s-node
state: stopped
ignore_errors: true
- name: Apply system updates (Debian-based)
apt:
update_cache: yes
upgrade: dist
when: ansible_os_family == "Debian"
- name: Apply system updates (RHEL-based)
yum:
name: "*"
state: latest
when: ansible_os_family == "RedHat"
- name: Reboot the worker node
reboot:
reboot_timeout: 600
- name: Wait for node to be reachable
wait_for:
host: ""
port: 22
delay: 30
timeout: 300
- name: Restart K3s agent
systemd:
name: k3s-node
state: started
- name: Uncordon the worker node
command: kubectl uncordon
delegate_to: ""
become: yes
Server Node Updates
The server node play handles updating k3s server nodes:
- Get the next available server node
- Drain the node to ensure no pods are running on it
- Stop the k3s server service
- Apply system updates
- Restart the k3s server service
- Uncordon the node to allow it to join the cluster again
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
- name: Update K3s server Nodes Serially
hosts: k3s_servers
serial: 1
become: true
vars:
k3s_service: "k3s"
active_server: ""
tasks:
- name: Check if servers should be updated
meta: end_play
when: not (update_servers | default(true) | bool)
- name: Get kube-vip DaemonSet name
ansible.builtin.set_fact:
kubevip_ds_name: "kube-vip-ds"
tags: kubevip-config
- name: Ensure kube-vip DaemonSet is set to desired count
ansible.builtin.command:
cmd: >-
kubectl patch daemonset -n kube-system --type=json -p '[
{"op":"replace","path":"/spec/template/spec/nodeSelector","value":{"node-role.kubernetes.io/master":"true"}},
{"op":"replace","path":"/spec/template/spec/tolerations","value":[
{"key":"node-role.kubernetes.io/master","operator":"Exists","effect":"NoSchedule"}
]}
]'
delegate_to: ""
become: true
register: final_patch_result
changed_when: final_patch_result.rc == 0
tags: kubevip-config
- name: Wait for kube-vip to reach desired count
ansible.builtin.shell:
cmd: "kubectl get daemonset -n kube-system -o jsonpath='{.status.desiredNumberScheduled}'"
delegate_to: ""
become: true
register: final_kubevip_count
until: final_kubevip_count.stdout | int == groups['k3s_servers'] | length
retries: 30
delay: 10
changed_when: false
tags: kubevip-config
- name: Get node name from kubectl
shell: kubectl get nodes -o name | grep -i
delegate_to: ""
become: true
register: node_name_raw
changed_when: false
- name: Extract matching node name
set_fact:
node_name: ""
- name: Drain the server node
command: >-
kubectl drain
--ignore-daemonsets
--delete-emptydir-data
--delete-local-data
delegate_to: ""
become: true
- name: Stop K3s server
systemd:
name: ""
state: stopped
- name: Apply system updates (Debian-based)
apt:
update_cache: true
upgrade: dist
when: ansible_os_family == "Debian"
- name: Apply system updates (RHEL-based)
yum:
name: "*"
state: latest
update_cache: true
when: ansible_os_family == "RedHat"
- name: Reboot the server node
reboot:
reboot_timeout: 600
test_command: uptime
- name: Wait for server node to be reachable
wait_for:
host: ""
port: 22
delay: 30
timeout: 300
- name: Restart K3s server
systemd:
name: ""
state: started
enabled: true
- name: Wait for server node to be Ready
command: >-
kubectl wait --for=condition=Ready
node/
--timeout=300s
delegate_to: ""
become: true
register: node_ready
changed_when: false
retries: 10
delay: 30
until: node_ready.rc == 0
- name: Uncordon the server node
ansible.builtin.command:
cmd: "kubectl uncordon "
delegate_to: ""
become: true
- name: Verify kube-vip DaemonSet is running on all control plane nodes
shell: "kubectl get daemonset -n kube-system -o jsonpath='{.status.numberReady}'"
delegate_to: ""
become: true
register: kubevip_ready_count
until: kubevip_ready_count.stdout | int == groups['k3s_servers'] | length
retries: 30
delay: 10
changed_when: false
- name: Reapply kube-vip DaemonSet configuration if not running on all nodes
ansible.builtin.command:
cmd: >-
kubectl patch daemonset -n kube-system --type=json -p '[
{"op":"replace","path":"/spec/template/spec/nodeSelector","value":{"node-role.kubernetes.io/master":"true"}},
{"op":"replace","path":"/spec/template/spec/tolerations","value":[
{"key":"node-role.kubernetes.io/master","operator":"Exists","effect":"NoSchedule"}
]}
]'
delegate_to: ""
become: true
when: kubevip_ready_count.stdout | int != groups['k3s_servers'] | length
register: reapply_patch_result
changed_when: reapply_patch_result.rc == 0
tags: kubevip-config
- name: Wait for kube-vip to be ready after reapply
shell: "kubectl get daemonset -n kube-system -o jsonpath='{.status.numberReady}'"
delegate_to: ""
become: true
register: final_kubevip_ready_count
until: final_kubevip_ready_count.stdout | int == groups['k3s_servers'] | length
retries: 30
delay: 10
changed_when: false
when: reapply_patch_result is changed
tags: kubevip-config
Create an inventory file
1
2
3
4
5
6
7
8
9
[k3s_servers]
k3s-server01 ansible_host=192.168.10.71 ansible_user=ansible
k3s-server02 ansible_host=192.168.10.72 ansible_user=ansible
k3s-server03 ansible_host=192.168.10.73 ansible_user=ansible
[k3s_workers]
k3s-worker01 ansible_host=192.168.10.74 ansible_user=ansible
k3s-worker02 ansible_host=192.168.10.75 ansible_user=ansible
k3s-worker03 ansible_host=192.168.10.76 ansible_user=ansible
Automating with AWX
AWX is the open-source version of Red Hat’s Ansible Tower that provides a web-based user interface and REST API for managing Ansible playbooks. It adds enterprise features like role-based access control, job scheduling, and detailed logging to your Ansible automation.
Project Setup
- Navigate to Resources → Projects
- Click Add
- Configure:
- Name: k3s Cluster Updates
- Source Control Type: Git
- Source Control URL: Your repository URL
- Update Revision on Launch: [X]
Inventory Setup
- Go to Resources → Inventories
- Create a new inventory
- Go to Resources → Templates
- Create a new Job Template:
- Name: k3s Cluster Updates
- Job Type: Run
- Inventory: Your k3s inventory
- Project: k3s Cluster Updates
- Playbook: update-k3s-cluster.yml
- Credentials: k3s SSH Key
- Options:
- Privilege Escalation: ✓
- Timeout: 3600 seconds
Create a schedule:
Name: Weekly k3s Updates Run Frequency: Weekly on Sunday at 2:00 AM
Usage
Run the playbook with different options locally:
1
2
3
4
5
6
7
8
# Update everything
ansible-playbook update-k3s-cluster.yml -i inventory.ini
# Skip worker updates
ansible-playbook update-k3s-cluster.yml -e "update_workers=false" -i inventory.ini
# Skip server updates
ansible-playbook update-k3s-cluster.yml -e "update_servers=false" -i inventory.ini
Benefits
- Zero-downtime updates through serial execution
- Safe handling of kube-vip migrations
- Automated scheduling through AWX
- Support for both Debian and RHEL-based systems
- Flexible update options for workers and servers