Skip to main content

ESXi5 Command Line Reference – Networking Part1

This post is going to explain you the troubleshooting experience of one of my recent issues which was caused by Pause Flood issue on HP Virtual Connect. Possibly All VMware Administrators will aware about the basic network troubleshooting like try to reach the host via ping, check for Physical NIC failures, Cable connectivity, and switch port failures or even router failure. This post is not going to explain you with this procedures for basic troubleshooting. If you would like to familiar with network related commands in ESXi ,please take a look at my previous posts:
ESXi5 Command Line Reference – Networking Part1
ESXi5 Command Line Reference – Networking Part2
I got a alert from the monitoring team for one of the ESXi host is not reachable on the network.  I thought may be PSOD (Purple Screen of Death) on host. I assume to reboot the host and fix the PSOD. When i connect to the ILO of my ESXi host, Host was Up and i tried to reach via ping but it is not reachable. I suspect issue could be problem with the Network adapter but it is not. Again thought to check the physical cabling of the host .That is also good. I checked with network team for switch port failures and it is also good.  I have verified the status of the network adapters of ESXi host from ILO . It was showing all NICs are down.
ESXi-Network adapater-down_1
My ESXi host is running on Blade server and we are using HP virtual Connect as the interconnect for our servers in the blade chassis. I suspect ther could be something wrong with my virtual connect. So decided to analyze my HP Virtual connect Logs. I found the error message “Port was  disabled because a pause flood was detected”  from my Virtual connect System Logs.
PauseFlood_1
When i checked the ports status of My Virtual Connect interconnect Bays, It displays the below information:
Connect to the HP virtual Connect -> Hardware -> Click on InterConnect Bays -> Click Bay 1 or Bay 2 ->Verify the status under Server Ports tab. It displays the Port status of “Not Linked/Pause Flood Detected”. It confirms the issue was caused by pause Flood. In some cases, a flex-10 port can enter into disabled state due to the triggering of “pause-flood”, or network-loop.
Pause Flood_2
You can confirm the same port status using Virtual Connect Manager CLI.
Connect to Virtual connect using SSH and Execute the below command
Show port-protect
Pause Flood_3

What is Pause Flood

We understood this issue was cause by Pause Flood. Let us understand what is Pause Flood. Ethernet switch interfaces use pause frame based flow control mechanisms to control data flow. When a pause frame is received on a flow control enabled interface, the transmit operation is stopped for the pause duration specified in the pause frame. All other frames destined for this interface are queued up. If another pause frame is received before the previous pause timer expires, the pause timer is refreshed to the new pause duration value. If a steady stream of pause frames is received for extended periods of time, the transmit queue for that interface continues to grow until all queuing resources are exhausted. This condition severely impacts the switch operation on other interfaces.
In addition, all protocol operations on the switch are impacted because of the inability to transmit protocol frames. Both port pause and priority-based pause frames can cause the same resource exhaustion condition. VC provides the ability to monitor server downlink ports for pause flood conditions and take protective action by disabling the port. The default polling interval is 10 seconds and is not user configurable. VC provides system logs and SNMP traps for events related to pause flood detection. This feature operates at the physical port level. When a pause flood condition is detected on a Flex-10 physical port, all Flex-10 logical ports associated with physical ports are disabled. When the pause flood protection feature is enabled, this feature detects pause flood conditions on server downlink ports and disables the port.

How to Fix Pause Flood Issue:

The port remains disabled until an administrative action is taken. The administrative action involves the following steps:
Action Plan 1: – Temporary and immediate Fix is to Re-enable the disabled ports on the VC interconnect modules using below method
1. Connect to your Virtual Connect using SSH
2. Execute the below command
 reset-port protect
reset port-protect
3. Verify the port status again using the below command and ensure no port’s protect types are reported as “Pause Flood”
Show port-protect
show port-protect
That’s it the above command fixed my issue immediately.
Action Plan 2: Update the Drivers and Firmwares
Resolve the issue with the NIC on the server causing the continuous pause generation. This might include updating the NIC firmware and device drivers.
I tried the action plan 1 and immediately my ESXi host started reaching from network ping. That’s it. It resolved my issue. I hope this is informative for you. Thanks for Reading !!!. Be Social and share it in social media if you feel worth sharing it.

Comments

Popular posts from this blog

  Issue with Aria Automation Custom form Multi Value Picker and Data Grid https://knowledge.broadcom.com/external/article?articleNumber=345960 Products VMware Aria Suite Issue/Introduction Symptoms: Getting  error " Expected Type String but was Object ", w hen trying to use Complex Types in MultiValue Picker on the Aria for Automation Custom Form. Environment VMware vRealize Automation 8.x Cause This issue has been identified where the problem appears when a single column Multi Value Picker or Data Grid is used. Resolution This is a known issue. There is a workaround.  Workaround: As a workaround, try adding one empty column in the Multivalue picker without filling the options. So we can add one more column without filling the value which will be hidden(there is a button in the designer page that will hide the column). This way the end user will receive the same view.  

57 Tips Every Admin Should Know

Active Directory 1. To quickly list all the groups in your domain, with members, run this command: dsquery group -limit 0 | dsget group -members –expand 2. To find all users whose accounts are set to have a non-expiring password, run this command: dsquery * domainroot -filter “(&(objectcategory=person)(objectclass=user)(lockoutTime=*))” -limit 0 3. To list all the FSMO role holders in your forest, run this command: netdom query fsmo 4. To refresh group policy settings, run this command: gpupdate 5. To check Active Directory replication on a domain controller, run this command: repadmin /replsummary 6. To force replication from a domain controller without having to go through to Active Directory Sites and Services, run this command: repadmin /syncall 7. To see what server authenticated you (or if you logged on with cached credentials) you can run either of these commands: set l echo %logonserver% 8. To see what account you are logged on as, run this command: ...
  The Guardrails of Automation VMware Cloud Foundation (VCF) 9.0 has redefined private cloud automation. With full-stack automation powered by Ansible and orchestrated through vRealize Orchestrator (vRO), and version-controlled deployments driven by GitOps and CI/CD pipelines, teams can build infrastructure faster than ever. But automation without guardrails is a recipe for risk Enter RBAC and policy enforcement. This third and final installment in our automation series focuses on how to secure and govern multi-tenant environments in VCF 9.0 with role-based access control (RBAC) and layered identity management. VCF’s IAM Foundation VCF 9.x integrates tightly with enterprise identity providers, enabling organizations to define and assign roles using existing Active Directory (AD) groups. With its persona-based access model, administrators can enforce strict boundaries across compute, storage, and networking resources: Personas : Global Admin, Tenant Admin, Contributor, Viewer Projec...