Post

Automating Kubernetes Cluster Updates with Ansible and AWX

Automating Kubernetes Cluster Updates with Ansible and AWX

For initial cluster deployment, I recommend using Techno Tim’s k3s-ansible playbook. This playbook makes it easy to deploy k3s cluster on Proxmox VMs in your homelab. The playbook handles all the complexity of setting up a k3s cluster, letting you focus on running workloads rather than infrastructure setup.

Keeping a kubernetes cluster up-to-date can be challenging, especially when you need to maintain high availability and handle components like kube-vip. This guide will show you how to automate cluster updates using Ansible and AWX.

The Challenge

Updating a kubernetes cluster requires careful orchestration to:

  • Maintain cluster availability during updates
  • Handle both worker and server nodes differently
  • Safely migrate critical components like kube-vip
  • Drain nodes before updates
  • Apply system updates
  • Restore nodes to service

The Solution: An Ansible Playbook

This solution uses an Ansible playbook that handles these requirements with two main plays: one for worker nodes and one for server nodes.

Worker Node Updates

The worker node play handles updating k3s worker nodes one at a time:

  • Drain the node to ensure no pods are running on it
  • Stop the k3s agent service
  • Apply system updates
  • Restart the k3s agent service
  • Uncordon the node to allow it to join the cluster again
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
- name: Update K3s Worker Nodes Serially
  hosts: k3s_workers
  serial: 1
  become: true
  vars:
    k3s_server: ""
    k3s_agent_service: "k3s-agent"

  tasks:
    - name: Check if workers should be updated
      meta: end_play
      when: not (update_workers | default(true) | bool)

    - name: Get node name from kubectl
      shell: kubectl get nodes -o name | grep -i 
      delegate_to: ""
      become: true
      register: node_name
      changed_when: false

    - name: Drain the worker node
      command: >
        kubectl drain  
        --ignore-daemonsets 
        --delete-emptydir-data
      delegate_to: ""
      become: true

    - name: Stop K3s agent
      systemd:
        name: k3s-node
        state: stopped
      ignore_errors: true

    - name: Apply system updates (Debian-based)
      apt:
        update_cache: yes
        upgrade: dist
      when: ansible_os_family == "Debian"

    - name: Apply system updates (RHEL-based)
      yum:
        name: "*"
        state: latest
      when: ansible_os_family == "RedHat"

    - name: Reboot the worker node
      reboot:
        reboot_timeout: 600

    - name: Wait for node to be reachable
      wait_for:
        host: ""
        port: 22
        delay: 30
        timeout: 300

    - name: Restart K3s agent
      systemd:
        name: k3s-node
        state: started

    - name: Uncordon the worker node
      command: kubectl uncordon 
      delegate_to: ""
      become: yes

Server Node Updates

The server node play handles updating k3s server nodes:

  • Get the next available server node
  • Drain the node to ensure no pods are running on it
  • Stop the k3s server service
  • Apply system updates
  • Restart the k3s server service
  • Uncordon the node to allow it to join the cluster again
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
- name: Update K3s server Nodes Serially
  hosts: k3s_servers
  serial: 1
  become: true
  vars:
    k3s_service: "k3s"
    active_server: ""
tasks:
  - name: Check if servers should be updated
    meta: end_play
    when: not (update_servers | default(true) | bool)

  - name: Get kube-vip DaemonSet name
    ansible.builtin.set_fact:
      kubevip_ds_name: "kube-vip-ds"
    tags: kubevip-config

  - name: Ensure kube-vip DaemonSet is set to desired count
    ansible.builtin.command:
      cmd: >-
        kubectl patch daemonset  -n kube-system --type=json -p '[
          {"op":"replace","path":"/spec/template/spec/nodeSelector","value":{"node-role.kubernetes.io/master":"true"}},
          {"op":"replace","path":"/spec/template/spec/tolerations","value":[
            {"key":"node-role.kubernetes.io/master","operator":"Exists","effect":"NoSchedule"}
          ]}
        ]'
    delegate_to: ""
    become: true
    register: final_patch_result
    changed_when: final_patch_result.rc == 0
    tags: kubevip-config

  - name: Wait for kube-vip to reach desired count
    ansible.builtin.shell:
      cmd: "kubectl get daemonset  -n kube-system -o jsonpath='{.status.desiredNumberScheduled}'"
    delegate_to: ""
    become: true
    register: final_kubevip_count
    until: final_kubevip_count.stdout | int == groups['k3s_servers'] | length
    retries: 30
    delay: 10
    changed_when: false
    tags: kubevip-config

  - name: Get node name from kubectl
    shell: kubectl get nodes -o name | grep -i 
    delegate_to: ""
    become: true
    register: node_name_raw
    changed_when: false

  - name: Extract matching node name
    set_fact:
      node_name: ""

  - name: Drain the server node
    command: >-
      kubectl drain 
      --ignore-daemonsets
      --delete-emptydir-data
      --delete-local-data
    delegate_to: ""
    become: true

  - name: Stop K3s server
    systemd:
      name: ""
      state: stopped

  - name: Apply system updates (Debian-based)
    apt:
      update_cache: true
      upgrade: dist
    when: ansible_os_family == "Debian"

  - name: Apply system updates (RHEL-based)
    yum:
      name: "*"
      state: latest
      update_cache: true
    when: ansible_os_family == "RedHat"

  - name: Reboot the server node
    reboot:
      reboot_timeout: 600
      test_command: uptime

  - name: Wait for server node to be reachable
    wait_for:
      host: ""
      port: 22
      delay: 30
      timeout: 300

  - name: Restart K3s server
    systemd:
      name: ""
      state: started
      enabled: true

  - name: Wait for server node to be Ready
    command: >-
      kubectl wait --for=condition=Ready
      node/
      --timeout=300s
    delegate_to: ""
    become: true
    register: node_ready
    changed_when: false
    retries: 10
    delay: 30
    until: node_ready.rc == 0

  - name: Uncordon the server node
    ansible.builtin.command:
      cmd: "kubectl uncordon "
    delegate_to: ""
    become: true

  - name: Verify kube-vip DaemonSet is running on all control plane nodes
    shell: "kubectl get daemonset  -n kube-system -o jsonpath='{.status.numberReady}'"
    delegate_to: ""
    become: true
    register: kubevip_ready_count
    until: kubevip_ready_count.stdout | int == groups['k3s_servers'] | length
    retries: 30
    delay: 10
    changed_when: false

  - name: Reapply kube-vip DaemonSet configuration if not running on all nodes
    ansible.builtin.command:
      cmd: >-
        kubectl patch daemonset  -n kube-system --type=json -p '[
          {"op":"replace","path":"/spec/template/spec/nodeSelector","value":{"node-role.kubernetes.io/master":"true"}},
          {"op":"replace","path":"/spec/template/spec/tolerations","value":[
            {"key":"node-role.kubernetes.io/master","operator":"Exists","effect":"NoSchedule"}
          ]}
        ]'
    delegate_to: ""
    become: true
    when: kubevip_ready_count.stdout | int != groups['k3s_servers'] | length
    register: reapply_patch_result
    changed_when: reapply_patch_result.rc == 0
    tags: kubevip-config

  - name: Wait for kube-vip to be ready after reapply
    shell: "kubectl get daemonset  -n kube-system -o jsonpath='{.status.numberReady}'"
    delegate_to: ""
    become: true
    register: final_kubevip_ready_count
    until: final_kubevip_ready_count.stdout | int == groups['k3s_servers'] | length
    retries: 30
    delay: 10
    changed_when: false
    when: reapply_patch_result is changed
    tags: kubevip-config

Create an inventory file

1
2
3
4
5
6
7
8
9
[k3s_servers]
k3s-server01 ansible_host=192.168.10.71 ansible_user=ansible
k3s-server02 ansible_host=192.168.10.72 ansible_user=ansible
k3s-server03 ansible_host=192.168.10.73 ansible_user=ansible

[k3s_workers]
k3s-worker01 ansible_host=192.168.10.74 ansible_user=ansible
k3s-worker02 ansible_host=192.168.10.75 ansible_user=ansible
k3s-worker03 ansible_host=192.168.10.76 ansible_user=ansible

Automating with AWX

AWX is the open-source version of Red Hat’s Ansible Tower that provides a web-based user interface and REST API for managing Ansible playbooks. It adds enterprise features like role-based access control, job scheduling, and detailed logging to your Ansible automation.

Project Setup

  1. Navigate to Resources → Projects
  2. Click Add
  3. Configure:
    • Name: k3s Cluster Updates
    • Source Control Type: Git
    • Source Control URL: Your repository URL
    • Update Revision on Launch: [X]

Inventory Setup

  1. Go to Resources → Inventories
  2. Create a new inventory
  3. Go to Resources → Templates
  4. Create a new Job Template:
    • Name: k3s Cluster Updates
    • Job Type: Run
    • Inventory: Your k3s inventory
    • Project: k3s Cluster Updates
    • Playbook: update-k3s-cluster.yml
    • Credentials: k3s SSH Key
    • Options:
      • Privilege Escalation: ✓
      • Timeout: 3600 seconds

Create a schedule:

Name: Weekly k3s Updates Run Frequency: Weekly on Sunday at 2:00 AM

Usage

Run the playbook with different options locally:

1
2
3
4
5
6
7
8
# Update everything
ansible-playbook update-k3s-cluster.yml -i inventory.ini

# Skip worker updates
ansible-playbook update-k3s-cluster.yml -e "update_workers=false" -i inventory.ini

# Skip server updates
ansible-playbook update-k3s-cluster.yml -e "update_servers=false" -i inventory.ini

Benefits

  • Zero-downtime updates through serial execution
  • Safe handling of kube-vip migrations
  • Automated scheduling through AWX
  • Support for both Debian and RHEL-based systems
  • Flexible update options for workers and servers
This post is licensed under CC BY 4.0 by the author.