Windows VM crash investigation: How to use vmware.log to determine crash origin
Issue/Introduction
- A Windows virtual machine experiences an unexpected crash or blue screen (BSOD)
- The Windows crash dump analysis shows bugcheck 0x80 (NMI_HARDWARE_FAILURE)
- Microsoft or another OS vendor indicates the crash may be related to virtual hardware and recommends engaging VMware Support
- In the VM's vmware.log, you see entries similar to:
YYYY-MM-DDTHH:MM:SS.XXXZ vcpu-0 - WinBSOD: Synthetic MSR[0x40000100] 0x80 YYYY-MM-DDTHH:MM:SS.XXXZ vcpu-0 - WinBSOD: Synthetic MSR[0x40000101] 0x4f4454 YYYY-MM-DDTHH:MM:SS.XXXZ vcpu-0 - WinBSOD: Synthetic MSR[0x40000102] 0x0 YYYY-MM-DDTHH:MM:SS.XXXZ vcpu-0 - WinBSOD: Synthetic MSR[0x40000103] 0x0 YYYY-MM-DDTHH:MM:SS.XXXZ vcpu-0 - WinBSOD: Synthetic MSR[0x40000104] 0x0
- In hostd.log, you see entries similar to:
YYYY-MM-DDTHH:MM:SS.XXXZ Hostd[#####]: Guest operating system crash detected.
Environment
- VMware vSphere ESXi 7.x
- VMware vSphere ESXi 8.x
- Windows guest operating system
Cause
When a Windows guest OS crashes, it writes crash parameters to the hypervisor using synthetic Model-Specific Registers (MSRs). These are special registers that allow the guest to communicate with the VMware hypervisor. The WinBSOD entries in vmware.log represent Windows notifying the hypervisor about a crash that has already occurred inside the guest.
The presence of WinBSOD entries does not indicate the hypervisor caused the crash. It indicates the hypervisor received notification of a crash from Windows.
Resolution
Use the following methodology to determine whether a Windows VM crash originated from the hypervisor or from within the guest OS.
Understanding WinBSOD synthetic MSR entries
The synthetic MSR values in vmware.log correspond directly to Windows bugcheck parameters:
| MSR Address | vmware.log Label | Windows Bugcheck Parameter |
|---|---|---|
| 0x40000100 | MSR[0x40000100] | Bugcheck Code |
| 0x40000101 | MSR[0x40000101] | Arg1 |
| 0x40000102 | MSR[0x40000102] | Arg2 |
| 0x40000103 | MSR[0x40000103] | Arg3 |
| 0x40000104 | MSR[0x40000104] | Arg4 |
For example, if vmware.log shows MSR[0x40000100] 0x80, this corresponds to Windows bugcheck code 0x80 (NMI_HARDWARE_FAILURE).
Step 1: Collect the VM's vmware.log
- Identify the datastore path for the affected VM.
- Retrieve the vmware.log file from the VM's directory. If the VM has been powered on since the crash, the relevant log may be archived as vmware-1.log, vmware-2.log, etc.
Step 2: Confirm the crash parameters match
- Compare the WinBSOD synthetic MSR values in vmware.log to the bugcheck parameters from the Windows crash dump analysis.
- If the values match, this confirms both logs are describing the same crash event.
Step 3: Check for hypervisor-initiated NMI
Review the following logs for evidence that the hypervisor sent an NMI to the guest:
Search hostd.log for NMI-related operations:
grep -i "Send_NMI_To_Guest\|HungVM" /var/log/hostd.log
If results show
/cgi-bin/vm-support.cgi?manifests=...HungVM:Send_NMI_To_Guest...near the crash time, the NMI was manually triggered by an administrator. See Manual Crash Triggered by NMI on Virtual machine hosted on vCenter/ESXi for more information.Search vmkernel.log for NMI injection events:
grep -i "nmi" /var/log/vmkernel.log
If no NMI injection events are found, the hypervisor did not send an NMI to the guest.
Step 4: Rule out other hypervisor-side causes
Review the following logs for the timeframe surrounding the crash:
| Potential Cause | Log to Review | What to Search For |
|---|---|---|
| VM stun/snapshot | vmware.log, vmkernel.log | stun, snapshot, freeze, quiesce |
| Storage timeout | vmkernel.log | SCSI, abort, timeout, APD, PDL |
| Host memory error | vmkernel.log | MCE, MCA, ECC, machine check |
| Hardware event | IPMI/SEL logs | Events near the crash timestamp |
| Virtual device error | vmware.log | Device errors prior to the WinBSOD entries |
Step 5: Interpret the findings
If no hypervisor-side events are found: The crash originated within the Windows guest OS. The WinBSOD entries confirm Windows reported its own crash to the hypervisor. Continue investigation with the OS vendor (Microsoft).
If hypervisor-side events are found: The crash may have been triggered or influenced by the hypervisor or underlying infrastructure. Collect an ESXi support bundle and engage VMware Support for further analysis.
Additional Information
- How to determine Virtual Machine Operations looking at the Virtual machine logs
- Manual Crash Triggered by NMI on Virtual machine hosted on vCenter/ESXi
- Troubleshooting a Virtual Machine that has stopped responding (VM hang/freeze)
- How to send NMI to Guest OS on ESXi 6.x or Later
- Microsoft: Bug Check 0x80: NMI_HARDWARE_FAILURE
Comments
Post a Comment