Skip to main content

ESXi5 Command Line Reference – Networking Part1

This post is going to explain you the troubleshooting experience of one of my recent issues which was caused by Pause Flood issue on HP Virtual Connect. Possibly All VMware Administrators will aware about the basic network troubleshooting like try to reach the host via ping, check for Physical NIC failures, Cable connectivity, and switch port failures or even router failure. This post is not going to explain you with this procedures for basic troubleshooting. If you would like to familiar with network related commands in ESXi ,please take a look at my previous posts:
ESXi5 Command Line Reference – Networking Part1
ESXi5 Command Line Reference – Networking Part2
I got a alert from the monitoring team for one of the ESXi host is not reachable on the network.  I thought may be PSOD (Purple Screen of Death) on host. I assume to reboot the host and fix the PSOD. When i connect to the ILO of my ESXi host, Host was Up and i tried to reach via ping but it is not reachable. I suspect issue could be problem with the Network adapter but it is not. Again thought to check the physical cabling of the host .That is also good. I checked with network team for switch port failures and it is also good.  I have verified the status of the network adapters of ESXi host from ILO . It was showing all NICs are down.
ESXi-Network adapater-down_1
My ESXi host is running on Blade server and we are using HP virtual Connect as the interconnect for our servers in the blade chassis. I suspect ther could be something wrong with my virtual connect. So decided to analyze my HP Virtual connect Logs. I found the error message “Port was  disabled because a pause flood was detected”  from my Virtual connect System Logs.
PauseFlood_1
When i checked the ports status of My Virtual Connect interconnect Bays, It displays the below information:
Connect to the HP virtual Connect -> Hardware -> Click on InterConnect Bays -> Click Bay 1 or Bay 2 ->Verify the status under Server Ports tab. It displays the Port status of “Not Linked/Pause Flood Detected”. It confirms the issue was caused by pause Flood. In some cases, a flex-10 port can enter into disabled state due to the triggering of “pause-flood”, or network-loop.
Pause Flood_2
You can confirm the same port status using Virtual Connect Manager CLI.
Connect to Virtual connect using SSH and Execute the below command
Show port-protect
Pause Flood_3

What is Pause Flood

We understood this issue was cause by Pause Flood. Let us understand what is Pause Flood. Ethernet switch interfaces use pause frame based flow control mechanisms to control data flow. When a pause frame is received on a flow control enabled interface, the transmit operation is stopped for the pause duration specified in the pause frame. All other frames destined for this interface are queued up. If another pause frame is received before the previous pause timer expires, the pause timer is refreshed to the new pause duration value. If a steady stream of pause frames is received for extended periods of time, the transmit queue for that interface continues to grow until all queuing resources are exhausted. This condition severely impacts the switch operation on other interfaces.
In addition, all protocol operations on the switch are impacted because of the inability to transmit protocol frames. Both port pause and priority-based pause frames can cause the same resource exhaustion condition. VC provides the ability to monitor server downlink ports for pause flood conditions and take protective action by disabling the port. The default polling interval is 10 seconds and is not user configurable. VC provides system logs and SNMP traps for events related to pause flood detection. This feature operates at the physical port level. When a pause flood condition is detected on a Flex-10 physical port, all Flex-10 logical ports associated with physical ports are disabled. When the pause flood protection feature is enabled, this feature detects pause flood conditions on server downlink ports and disables the port.

How to Fix Pause Flood Issue:

The port remains disabled until an administrative action is taken. The administrative action involves the following steps:
Action Plan 1: – Temporary and immediate Fix is to Re-enable the disabled ports on the VC interconnect modules using below method
1. Connect to your Virtual Connect using SSH
2. Execute the below command
 reset-port protect
reset port-protect
3. Verify the port status again using the below command and ensure no port’s protect types are reported as “Pause Flood”
Show port-protect
show port-protect
That’s it the above command fixed my issue immediately.
Action Plan 2: Update the Drivers and Firmwares
Resolve the issue with the NIC on the server causing the continuous pause generation. This might include updating the NIC firmware and device drivers.
I tried the action plan 1 and immediately my ESXi host started reaching from network ping. That’s it. It resolved my issue. I hope this is informative for you. Thanks for Reading !!!. Be Social and share it in social media if you feel worth sharing it.

Comments

Popular posts from this blog

Quick Guide to VCF Automation for VCD Administrators

  Quick Guide to VCF Automation for VCD Administrators VMware Cloud Foundation 9 (VCF 9) has been  released  and with it comes brand new Cloud Management Platform –  VCF Automation (VCFA)  which supercedes both Aria Automation and VMware Cloud Director (VCD). This blog post is intended for those people that know VCD quite well and want to understand how is VCFA similar or different to help them quickly orient in the new direction. It should be emphasized that VCFA is a new solution and not just rebranding of an old one. However it reuses a lot of components from its predecessors. The provider part of VCFA called Tenenat Manager is based on VCD code and the UI and APIs will be familiar to VCD admins, while the tenant part inherist a lot from Aria Automation and especially for VCD end-users will look brand new. Deployment and Architecture VCFA is generaly deployed from VCF Operations Fleet Management (former Aria Suite LCM embeded in VCF Ops. Fleet Management...
  Issue with Aria Automation Custom form Multi Value Picker and Data Grid https://knowledge.broadcom.com/external/article?articleNumber=345960 Products VMware Aria Suite Issue/Introduction Symptoms: Getting  error " Expected Type String but was Object ", w hen trying to use Complex Types in MultiValue Picker on the Aria for Automation Custom Form. Environment VMware vRealize Automation 8.x Cause This issue has been identified where the problem appears when a single column Multi Value Picker or Data Grid is used. Resolution This is a known issue. There is a workaround.  Workaround: As a workaround, try adding one empty column in the Multivalue picker without filling the options. So we can add one more column without filling the value which will be hidden(there is a button in the designer page that will hide the column). This way the end user will receive the same view.  
  "Cloud zone insights not available yet, please check after some time" message on Aria Automation https://knowledge.broadcom.com/external/article?articleNumber=314894 Products VMware Aria Suite Issue/Introduction Symptoms: The certificate for Aria operations has been replaced since it was initially added to Aria Automation as an integration. When accessing the Insights pane under  Cloud Assembly  ->  Infrastructure  ->  Cloud Zone  ->  Insights  the following message is displayed:   "Cloud zone insights not available yet, please check after some time." The  /var/log/services-logs/prelude/hcmp-service-app/file-logs/hcmp-service-app.log  file contains ssl errors similar to:   2022-08-25T20:06:43.989Z ERROR hcmp-service [host='hcmp-service-app-xxxxxxx-xxxx' thread='Thread-56' user='' org='<org_id>' trace='<trace_id>' parent='<parent_id>' span='<span_id>'] c.v.a.h.a.common.AlertEnu...