Aria Automation nodes in not ready state and deploy.sh fails
https://knowledge.broadcom.com/external/article/377044/contact-broadcom-support.html
Products
Issue/Introduction
- Unable to shut down Aria Automation by running
/opt/scripts/deploy.sh --shutdown
- deploy.sh script fails with below error:
Running check eth0-ip
Running check node-name
Running check non-default-hostname
Running check single-aptr
Running check nodes-ready
make: *** [/opt/health/Makefile:56: nodes-ready] Error 1
Running check nodes-count
Running check fips
make: Target 'deploy' not remade because of errors.
- Running
kubectl get nodes
shows one node in aNotReady
state - Running
kubectl -n prelude get pods -o wide
shows thepostgres-0
pod in a pending state - Running
kubectl describe nodes | grep "Name:\|Taints:"
shows that the node wherepostgres-0
is running is tainted
Environment
- Aria Automation 8.x three node cluster
Cause
- One of the nodes is tainted and therefore in a
NotReady
state. This causes the health check scripts to fail whendeploy.sh
is run
Resolution
- Work around this issue by removing the taint.
- Workaround Steps:
- Run
kubectl get nodes
to determine which node is aNotReady
state - Run
kubectl -n prelude get pods -o wide
to verify that thepostgres-0
pod in a pending state - Run
kubectl describe nodes | grep "Name:\|Taints:"
verify that the node wherepostgres-0
is running is tainted
- Run
root@servername01 [ ~ ]# kubectl describe nodes | grep "Name:\|Taints:"
Name: servername01.example.local
Taints: node.kubernetes.io/unreachable:NoSchedule
Name: servername02.example.local
Taints: <none>
Name: servername03.example.local
Taints: <none>
- Run this to remove the taint (replace the relevant servername in the command with the tainted node in your environment from the above commands):
kubectl taint nodes servername01.example.local node.kubernetes.io/unreachable:NoSchedule-
- Run this again to verify that no nodes are tainted:
kubectl describe nodes | grep "Name:\|Taints:"
- Run this to remove the taint (replace the relevant servername in the command with the tainted node in your environment from the above commands):
It should now show that there is no taint:
root@servername01 [ ~ ]# kubectl describe nodes | grep "Name:\|Taints:"
Name: servername01.example.local
Taints: <none>
Name: servername02.example.local
Taints: <none>
Name: servername03.example.local
Taints: <none>
Note: In some cases the taint may still show up, rerun the command until the taint no longer shows on the affected node or any of the nodes.
- Now run
kubectl get nodes
- After all three nodes show as
"Ready"
, the shutdown command can be run again.
- Now run
/opt/scripts/deploy.sh --shutdown
Comments
Post a Comment