In vSphere, Permanent Device Loss (PDL) and All Paths Down (APD) are two distinct storage connectivity issues. PDL indicates a permanent and unrecoverable loss of a storage device, while APD signifies a temporary loss of access, where the system expects the device to potentially return. VMware vSphere, including VMware Cloud Foundation (VCF), provides mechanisms to handle both PDL and APD to ensure virtual machine availability.
Here's a more detailed explanation:
Permanent Device Loss (PDL):
Definition:
PDL occurs when a storage device is permanently lost, meaning it's unlikely to be recovered. This can be due to hardware failure, a device being removed without proper procedures, or other unrecoverable errors.
vSphere Handling:
vSphere interprets certain SCSI sense codes from the storage array as indicators of PDL. Once a PDL is detected, the ESXi host stops retrying I/O to the affected device and the affected VMs are failed over if VM Component Protection (VMCP) is enabled.
Example:
A storage controller reporting a failure or a LUN being permanently removed would be considered PDL.
All Paths Down (APD):
Definition:
APD occurs when a storage device becomes unavailable to the ESXi host, and the host cannot determine if the loss is temporary or permanent.
vSphere Handling:
The ESXi host will retry I/O to the affected device for a configured timeout period (default is 140 seconds). If the device recovers within that time, operations continue without interruption. If the device remains unavailable after the timeout, the host starts to fast-fail I/O operations to the device, but virtual machine I/O continues to be retried indefinitely.
Example:
A network issue causing a storage array to become unreachable, or a host reboot, could lead to an APD condition.
VMware Cloud Foundation (VCF) and PDL/APD:
Integration:
VCF, which integrates vSphere, vSAN, and NSX, also provides mechanisms to handle PDL and APD events.
Automated Remediation:
VCF's automation capabilities can be leveraged to orchestrate failover procedures and ensure virtual machines are migrated to healthy hosts in the event of PDL or APD.
Consistency:
VCF aims to provide a consistent and reliable infrastructure, and its management capabilities play a crucial role in mitigating the impact of storage connectivity issues.
Comments
Post a Comment