Aria Automation nodes in not ready state and deploy.sh fails

https://knowledge.broadcom.com/external/article/377044/contact-broadcom-support.html

Products

VMware Aria Suite

Issue/Introduction

Unable to shut down Aria Automation by running /opt/scripts/deploy.sh --shutdown
deploy.sh script fails with below error:

Running check eth0-ip

Running check node-name

Running check non-default-hostname

Running check single-aptr

Running check nodes-ready
make: *** [/opt/health/Makefile:56: nodes-ready] Error 1
Running check nodes-count

Running check fips

make: Target 'deploy' not remade because of errors.

Running kubectl get nodes shows one node in a NotReady state
Running kubectl -n prelude get pods -o wide shows the postgres-0 pod in a pending state
Running kubectl describe nodes | grep "Name:\|Taints:" shows that the node where postgres-0 is running is tainted

Environment

Aria Automation 8.x three node cluster

Cause

One of the nodes is tainted and therefore in a NotReady state. This causes the health check scripts to fail when deploy.sh is run

Resolution

Work around this issue by removing the taint.
Workaround Steps:
- Run kubectl get nodes to determine which node is a NotReady state
- Run kubectl -n prelude get pods -o wide to verify that the postgres-0 pod in a pending state
- Run kubectl describe nodes | grep "Name:\|Taints:" verify that the node where postgres-0 is running is tainted

root@servername01 [ ~ ]# kubectl describe nodes | grep "Name:\|Taints:"
Name: servername01.example.local
Taints: node.kubernetes.io/unreachable:NoSchedule
Name: servername02.example.local
Taints: <none>
Name: servername03.example.local
Taints: <none>

- Run this to remove the taint (replace the relevant servername in the command with the tainted node in your environment from the above commands):
  
  kubectl taint nodes servername01.example.local node.kubernetes.io/unreachable:NoSchedule-
- Run this again to verify that no nodes are tainted:
  
  kubectl describe nodes | grep "Name:\|Taints:"

It should now show that there is no taint:

root@servername01 [ ~ ]# kubectl describe nodes | grep "Name:\|Taints:"
Name: servername01.example.local
Taints: <none>
Name: servername02.example.local
Taints: <none>
Name: servername03.example.local
Taints: <none>

Note: In some cases the taint may still show up, rerun the command until the taint no longer shows on the affected node or any of the nodes.

- Now run kubectl get nodes
- After all three nodes show as "Ready", the shutdown command can be run again.

/opt/scripts/deploy.sh --shutdown

Tech-Gen

Search This Blog