Skip to main content

Aria Automation nodes in not ready state and deploy.sh fails

 Aria Automation nodes in not ready state and deploy.sh fails

https://knowledge.broadcom.com/external/article/377044/contact-broadcom-support.html

Products

VMware Aria Suite

Issue/Introduction

  • Unable to shut down Aria Automation by running /opt/scripts/deploy.sh --shutdown
  • deploy.sh script fails with below error: 

Running check eth0-ip

Running check node-name

Running check non-default-hostname

Running check single-aptr

Running check nodes-ready
make: *** [/opt/health/Makefile:56: nodes-ready] Error 1
Running check nodes-count

Running check fips

make: Target 'deploy' not remade because of errors.

 

  • Running kubectl get nodes shows one node in a NotReady state
  • Running kubectl -n prelude get pods -o wide shows the postgres-0 pod in a pending state
  • Running kubectl describe nodes | grep "Name:\|Taints:" shows that the node where postgres-0 is running is tainted


Environment

  • Aria Automation 8.x three node cluster

Cause

  • One of the nodes is tainted and therefore in a NotReady state. This causes the health check scripts to fail when deploy.sh is run

Resolution

  • Work around this issue by removing the taint.
  • Workaround Steps: 
     
    • Run kubectl get nodes to determine which node is a NotReady state
    • Run kubectl -n prelude get pods -o wide to verify that the postgres-0 pod in a pending state
    • Run kubectl describe nodes | grep "Name:\|Taints:" verify that the node where postgres-0 is running is tainted

root@servername01 [ ~ ]# kubectl describe nodes | grep "Name:\|Taints:"
Name:               servername01.example.local
Taints:             node.kubernetes.io/unreachable:NoSchedule
Name:               servername02.example.local
Taints:             <none>
Name:               servername03.example.local
Taints:             <none>

    • Run this to remove the taint (replace the relevant servername in the command with the tainted node in your environment from the above commands):

      kubectl taint nodes servername01.example.local node.kubernetes.io/unreachable:NoSchedule-
    • Run this again to verify that no nodes are tainted:

      kubectl describe nodes | grep "Name:\|Taints:" 

It should now show that there is no taint:

root@servername01 [ ~ ]# kubectl describe nodes | grep "Name:\|Taints:"
Name:               servername01.example.local
Taints:             <none>
Name:               servername02.example.local
Taints:             <none>
Name:               servername03.example.local
Taints:             <none>

Note: In some cases the taint may still show up, rerun the command until the taint no longer shows on the affected node or any of the nodes. 

    • Now run kubectl get nodes
    • After all three nodes show as "Ready", the shutdown command can be run again. 

/opt/scripts/deploy.sh --shutdown

Comments

Popular posts from this blog

  Issue with Aria Automation Custom form Multi Value Picker and Data Grid https://knowledge.broadcom.com/external/article?articleNumber=345960 Products VMware Aria Suite Issue/Introduction Symptoms: Getting  error " Expected Type String but was Object ", w hen trying to use Complex Types in MultiValue Picker on the Aria for Automation Custom Form. Environment VMware vRealize Automation 8.x Cause This issue has been identified where the problem appears when a single column Multi Value Picker or Data Grid is used. Resolution This is a known issue. There is a workaround.  Workaround: As a workaround, try adding one empty column in the Multivalue picker without filling the options. So we can add one more column without filling the value which will be hidden(there is a button in the designer page that will hide the column). This way the end user will receive the same view.  

57 Tips Every Admin Should Know

Active Directory 1. To quickly list all the groups in your domain, with members, run this command: dsquery group -limit 0 | dsget group -members –expand 2. To find all users whose accounts are set to have a non-expiring password, run this command: dsquery * domainroot -filter “(&(objectcategory=person)(objectclass=user)(lockoutTime=*))” -limit 0 3. To list all the FSMO role holders in your forest, run this command: netdom query fsmo 4. To refresh group policy settings, run this command: gpupdate 5. To check Active Directory replication on a domain controller, run this command: repadmin /replsummary 6. To force replication from a domain controller without having to go through to Active Directory Sites and Services, run this command: repadmin /syncall 7. To see what server authenticated you (or if you logged on with cached credentials) you can run either of these commands: set l echo %logonserver% 8. To see what account you are logged on as, run this command: ...
  The Guardrails of Automation VMware Cloud Foundation (VCF) 9.0 has redefined private cloud automation. With full-stack automation powered by Ansible and orchestrated through vRealize Orchestrator (vRO), and version-controlled deployments driven by GitOps and CI/CD pipelines, teams can build infrastructure faster than ever. But automation without guardrails is a recipe for risk Enter RBAC and policy enforcement. This third and final installment in our automation series focuses on how to secure and govern multi-tenant environments in VCF 9.0 with role-based access control (RBAC) and layered identity management. VCF’s IAM Foundation VCF 9.x integrates tightly with enterprise identity providers, enabling organizations to define and assign roles using existing Active Directory (AD) groups. With its persona-based access model, administrators can enforce strict boundaries across compute, storage, and networking resources: Personas : Global Admin, Tenant Admin, Contributor, Viewer Projec...