Tech-Gen

How to remove and rejoin a faulty node in Aria Automation 8.x Cluster

https://knowledge.broadcom.com/external/article?articleNumber=345933

Products

VMware Aria Suite

Issue/Introduction

How to remove and rejoin a faulty node in Aria Automation 8.x Cluster.

Environment

VMware Aria Automation 8.x

Resolution

If it is determined that a node is faulty and we need to remove and rejoin the node in the cluster, take the following steps.

In vCenter, take backup snapshots of every appliance in the VMware Aria automation HA configuration.(Non-Memory)
From a root command line on any healthy node, run the following:

kubectl get pod `vracli status | jq -r '.databaseNodes[] | select(.["Role"] == "primary") | .["Node name"]' | cut -d '.' -f 1` -n prelude -o wide --no-headers=true

example:
postgres-0 1/1 Running 0 39h ##.###.#.## healthy_node-fqdn-xxx-xx.company.com <none> <none>

Important:The primary database node must be one of the healthy nodes. If the primary database node is faulty, contact technical support instead of proceeding.

From the root command line of the healthy node, remove the faulty node.

vracli cluster remove faulty-node-FQDN

From the Faulty node, join the vRealize Automation cluster.

vracli cluster join primary-DB-node-FQDN

Login as root to the command line of the primary database node.
Deploy services on the cluster by running the following script.

/opt/scripts/deploy.sh

Verify by running the command the node is joined and in "Ready" State:

kubectl get nodes

Additional Information

If the faulty node has a damaged etcd database or other Kubernetes elements, even after being removed from the cluster, then you can reset the k8s system by running this command on the faulty node:

vracli cluster leave

This can allow the faulty node to join the cluster in cases where the vracli cluster join command above hangs indefinitely (giving no output after 10-15 minutes).

Tech-Gen

Search This Blog

Products

Issue/Introduction

Environment

Resolution

Additional Information

Comments

Post a Comment

Popular posts from this blog

Quick Guide to VCF Automation for VCD Administrators

Step-by-Step Explanation of Ballooning, Compression & Swapping in VMware