Skip to main content

ESXi Host Troubleshooting Checklist

 

🛠️ ESXi Host Troubleshooting Checklist (With Complete Log Locations)

1. Host Status & Connectivity

  • Check host state in vCenter (Connected / Not Responding / Disconnected)
  • Review CPU, RAM, and datastore usage
  • Validate HA/DRS recommendations

2. Hardware Health

  • Monitor hardware sensors (CPU, DIMMs, fans, PSU, RAID)
  • Check RAID controller logs via vendor tools
  • View hardware status in:
    • vCenter → Monitor → Hardware Health
    • DCUI → Hardware Status

3. Network Validation

  • Verify vSwitches, Port Groups, NIC teaming, VLAN tagging
  • Check vmkernel ports (mgmt, vMotion, iSCSI, vSAN)
  • Test reachability using:
    Shell
    vmkping <IP>
  • Validate physical NIC status and link speed

4. Storage & Datastore Checks

  • Confirm datastore accessibility
  • Rescan storage adapters (iSCSI/FC/NFS)
  • Validate multipathing (Round Robin / Fixed / MRU)
  • Check datastore latency with esxtop

✅ **5. Critical Logs & Their Locations (Full List)

Here are key ESXi log files that every VMware admin should know:

🔹 Core ESXi Logs

Log FilePathPurpose
vmkernel.log/var/log/vmkernel.logStorage, HBA, NIC drivers, device errors, kernel events
hostd.log/var/log/hostd.logHost management service; VM operations, host config
vpxa.log/var/log/vpxa.logvCenter agent communication from ESXi side
vmkwarning.log/var/log/vmkwarning.logSystem warnings from kernel
messages/var/log/messagesGeneral OS-level events

🔹 VM & Storage Specific

Log FilePathPurpose
vmfs.log/var/log/vmfs.logVMFS volumes, mount/unmount, metadata issues
fdm.log/var/log/fdm.logHA agent logs
vobd.log/var/log/vobd.logVOB notifications (system health, storage alerts)
nfs.log/var/log/nfs.logNFS datastore issues
iscsi.log/var/log/iscsi.logiSCSI storage adapter logs
sanfs.log/var/log/sanfs.logvSAN file service logs

🔹 VM-Specific Log

Each VM has its own log file inside its folder:

/vmfs/volumes/<datastore>/<VM-Name>/vmware.log

🔹 ESXi System Services

Log FilePathPurpose
auth.log/var/log/auth.logSSH logins, authentication attempts
syslog.log/var/log/syslog.logSystem-level logging, forwarding info
shell.log/var/log/shell.logESXi Shell/SSH activity

6. Restart Management Services

If host becomes unresponsive to vCenter:

Restart management stack:

Shell
services.sh restart

Restart individually:

Shell
/etc/init.d/hostd restart
/etc/init.d/vpxa restart
/etc/init.d/vsanmgmt restart

Or via DCUI → Troubleshooting Options.


7. VM-Level Checks

  • Check VM power state
  • Validate VMware Tools
  • Monitor for snapshot sprawl
  • Review VM log file (vmware.log)

8. Patch, Firmware & Compatibility

  • Ensure latest ESXi patch level
  • Verify firmware/driver/hardware support on VMware HCL
  • Confirm compatibility with vCenter version


Comments

Popular posts from this blog

Quick Guide to VCF Automation for VCD Administrators

  Quick Guide to VCF Automation for VCD Administrators VMware Cloud Foundation 9 (VCF 9) has been  released  and with it comes brand new Cloud Management Platform –  VCF Automation (VCFA)  which supercedes both Aria Automation and VMware Cloud Director (VCD). This blog post is intended for those people that know VCD quite well and want to understand how is VCFA similar or different to help them quickly orient in the new direction. It should be emphasized that VCFA is a new solution and not just rebranding of an old one. However it reuses a lot of components from its predecessors. The provider part of VCFA called Tenenat Manager is based on VCD code and the UI and APIs will be familiar to VCD admins, while the tenant part inherist a lot from Aria Automation and especially for VCD end-users will look brand new. Deployment and Architecture VCFA is generaly deployed from VCF Operations Fleet Management (former Aria Suite LCM embeded in VCF Ops. Fleet Management...
  Issue with Aria Automation Custom form Multi Value Picker and Data Grid https://knowledge.broadcom.com/external/article?articleNumber=345960 Products VMware Aria Suite Issue/Introduction Symptoms: Getting  error " Expected Type String but was Object ", w hen trying to use Complex Types in MultiValue Picker on the Aria for Automation Custom Form. Environment VMware vRealize Automation 8.x Cause This issue has been identified where the problem appears when a single column Multi Value Picker or Data Grid is used. Resolution This is a known issue. There is a workaround.  Workaround: As a workaround, try adding one empty column in the Multivalue picker without filling the options. So we can add one more column without filling the value which will be hidden(there is a button in the designer page that will hide the column). This way the end user will receive the same view.  

Step-by-Step Explanation of Ballooning, Compression & Swapping in VMware

 ðŸ”¹ Step-by-Step Explanation of Ballooning, Compression & Swapping in VMware ⸻ 1️⃣ Memory Ballooning (vmmemctl) Ballooning is the first memory reclamation technique used when ESXi detects memory pressure. ➤ Step-by-Step: How Ballooning Works  1. VMware Tools installs the balloon driver (vmmemctl) inside the guest OS.  2. ESXi detects low free memory on the host.  3. ESXi inflates the balloon in selected VMs.  4. Balloon driver occupies guest memory, making the OS think RAM is full.  5. Guest OS frees idle / unused pages (because it believes memory is needed).  6. ESXi reclaims those freed pages and makes them available to other VMs. Why Ballooning Happens?  • Host free memory is very low.  • ESXi wants the VM to release unused pages before resorting to swapping. Example  • Host memory: 64 GB  • VMs used: 62 GB  • Free: 2 GB → ESXi triggers ballooning  • VM1 (8 GB RAM): Balloon inflates to 2 GB → OS frees 2 GB → ESXi re...