Skip to main content

Windows VM crash investigation: How to use vmware.log to determine crash origin

 Windows VM crash investigation: How to use vmware.log to determine crash origin

Issue/Introduction

  • A Windows virtual machine experiences an unexpected crash or blue screen (BSOD)
  • The Windows crash dump analysis shows bugcheck 0x80 (NMI_HARDWARE_FAILURE)
  • Microsoft or another OS vendor indicates the crash may be related to virtual hardware and recommends engaging VMware Support
  • In the VM's vmware.log, you see entries similar to:
    YYYY-MM-DDTHH:MM:SS.XXXZ vcpu-0 - WinBSOD: Synthetic MSR[0x40000100] 0x80
    YYYY-MM-DDTHH:MM:SS.XXXZ vcpu-0 - WinBSOD: Synthetic MSR[0x40000101] 0x4f4454
    YYYY-MM-DDTHH:MM:SS.XXXZ vcpu-0 - WinBSOD: Synthetic MSR[0x40000102] 0x0
    YYYY-MM-DDTHH:MM:SS.XXXZ vcpu-0 - WinBSOD: Synthetic MSR[0x40000103] 0x0
    YYYY-MM-DDTHH:MM:SS.XXXZ vcpu-0 - WinBSOD: Synthetic MSR[0x40000104] 0x0
    
  • In hostd.log, you see entries similar to:
    YYYY-MM-DDTHH:MM:SS.XXXZ Hostd[#####]: Guest operating system crash detected.
    

Environment

  • VMware vSphere ESXi 7.x
  • VMware vSphere ESXi 8.x
  • Windows guest operating system

Cause

When a Windows guest OS crashes, it writes crash parameters to the hypervisor using synthetic Model-Specific Registers (MSRs). These are special registers that allow the guest to communicate with the VMware hypervisor. The WinBSOD entries in vmware.log represent Windows notifying the hypervisor about a crash that has already occurred inside the guest.

The presence of WinBSOD entries does not indicate the hypervisor caused the crash. It indicates the hypervisor received notification of a crash from Windows.

Resolution

Use the following methodology to determine whether a Windows VM crash originated from the hypervisor or from within the guest OS.

Understanding WinBSOD synthetic MSR entries

The synthetic MSR values in vmware.log correspond directly to Windows bugcheck parameters:

MSR Addressvmware.log LabelWindows Bugcheck Parameter
0x40000100MSR[0x40000100]Bugcheck Code
0x40000101MSR[0x40000101]Arg1
0x40000102MSR[0x40000102]Arg2
0x40000103MSR[0x40000103]Arg3
0x40000104MSR[0x40000104]Arg4

For example, if vmware.log shows MSR[0x40000100] 0x80, this corresponds to Windows bugcheck code 0x80 (NMI_HARDWARE_FAILURE).

Step 1: Collect the VM's vmware.log

  1. Identify the datastore path for the affected VM.
  2. Retrieve the vmware.log file from the VM's directory. If the VM has been powered on since the crash, the relevant log may be archived as vmware-1.log, vmware-2.log, etc.

Step 2: Confirm the crash parameters match

  1. Compare the WinBSOD synthetic MSR values in vmware.log to the bugcheck parameters from the Windows crash dump analysis.
  2. If the values match, this confirms both logs are describing the same crash event.

Step 3: Check for hypervisor-initiated NMI

Review the following logs for evidence that the hypervisor sent an NMI to the guest:

  1. Search hostd.log for NMI-related operations:

    grep -i "Send_NMI_To_Guest\|HungVM" /var/log/hostd.log
    

    If results show /cgi-bin/vm-support.cgi?manifests=...HungVM:Send_NMI_To_Guest... near the crash time, the NMI was manually triggered by an administrator. See Manual Crash Triggered by NMI on Virtual machine hosted on vCenter/ESXi for more information.

  2. Search vmkernel.log for NMI injection events:

    grep -i "nmi" /var/log/vmkernel.log
    
  3. If no NMI injection events are found, the hypervisor did not send an NMI to the guest.

Step 4: Rule out other hypervisor-side causes

Review the following logs for the timeframe surrounding the crash:

Potential CauseLog to ReviewWhat to Search For
VM stun/snapshotvmware.log, vmkernel.logstun, snapshot, freeze, quiesce
Storage timeoutvmkernel.logSCSI, abort, timeout, APD, PDL
Host memory errorvmkernel.logMCE, MCA, ECC, machine check
Hardware eventIPMI/SEL logsEvents near the crash timestamp
Virtual device errorvmware.logDevice errors prior to the WinBSOD entries

 

Step 5: Interpret the findings

If no hypervisor-side events are found: The crash originated within the Windows guest OS. The WinBSOD entries confirm Windows reported its own crash to the hypervisor. Continue investigation with the OS vendor (Microsoft).

If hypervisor-side events are found: The crash may have been triggered or influenced by the hypervisor or underlying infrastructure. Collect an ESXi support bundle and engage VMware Support for further analysis.

Additional Information

Comments

Popular posts from this blog

Quick Guide to VCF Automation for VCD Administrators

  Quick Guide to VCF Automation for VCD Administrators VMware Cloud Foundation 9 (VCF 9) has been  released  and with it comes brand new Cloud Management Platform –  VCF Automation (VCFA)  which supercedes both Aria Automation and VMware Cloud Director (VCD). This blog post is intended for those people that know VCD quite well and want to understand how is VCFA similar or different to help them quickly orient in the new direction. It should be emphasized that VCFA is a new solution and not just rebranding of an old one. However it reuses a lot of components from its predecessors. The provider part of VCFA called Tenenat Manager is based on VCD code and the UI and APIs will be familiar to VCD admins, while the tenant part inherist a lot from Aria Automation and especially for VCD end-users will look brand new. Deployment and Architecture VCFA is generaly deployed from VCF Operations Fleet Management (former Aria Suite LCM embeded in VCF Ops. Fleet Management...
  Issue with Aria Automation Custom form Multi Value Picker and Data Grid https://knowledge.broadcom.com/external/article?articleNumber=345960 Products VMware Aria Suite Issue/Introduction Symptoms: Getting  error " Expected Type String but was Object ", w hen trying to use Complex Types in MultiValue Picker on the Aria for Automation Custom Form. Environment VMware vRealize Automation 8.x Cause This issue has been identified where the problem appears when a single column Multi Value Picker or Data Grid is used. Resolution This is a known issue. There is a workaround.  Workaround: As a workaround, try adding one empty column in the Multivalue picker without filling the options. So we can add one more column without filling the value which will be hidden(there is a button in the designer page that will hide the column). This way the end user will receive the same view.  

Step-by-Step Explanation of Ballooning, Compression & Swapping in VMware

 ðŸ”¹ Step-by-Step Explanation of Ballooning, Compression & Swapping in VMware ⸻ 1️⃣ Memory Ballooning (vmmemctl) Ballooning is the first memory reclamation technique used when ESXi detects memory pressure. ➤ Step-by-Step: How Ballooning Works  1. VMware Tools installs the balloon driver (vmmemctl) inside the guest OS.  2. ESXi detects low free memory on the host.  3. ESXi inflates the balloon in selected VMs.  4. Balloon driver occupies guest memory, making the OS think RAM is full.  5. Guest OS frees idle / unused pages (because it believes memory is needed).  6. ESXi reclaims those freed pages and makes them available to other VMs. Why Ballooning Happens?  • Host free memory is very low.  • ESXi wants the VM to release unused pages before resorting to swapping. Example  • Host memory: 64 GB  • VMs used: 62 GB  • Free: 2 GB → ESXi triggers ballooning  • VM1 (8 GB RAM): Balloon inflates to 2 GB → OS frees 2 GB → ESXi re...