Skip to main content

VMware vSphere – Affinity, Anti-Affinity Rules & Admission Control

 

🔷 VMware vSphere – Affinity, Anti-Affinity Rules & Admission Control (Detailed Explanation)

These features are critical in enterprise cluster design to ensure availability, compliance, performance, and predictable failover capacity.


🔹 1️⃣ Affinity & Anti-Affinity Rules (DRS Rules)

These are DRS cluster rules that control how VMs are placed across ESXi hosts.

They ensure workload placement aligns with business, licensing, and availability requirements.


✅ A. VM–VM Affinity Rules

🔹 What It Does:

Forces selected VMs to run together on the same host.

📌 Use Cases:

  • Multi-tier applications needing low latency (App + Middleware)

  • Application server tightly coupled with backend

  • Licensing tied to single host execution

🧠 Example:

Web Server + App Server must stay on same host for performance.

DRS ensures:
✔ Both VMs move together during migration
✔ They restart together after HA failover


❌ B. VM–VM Anti-Affinity Rules

🔹 What It Does:

Forces selected VMs to run on different hosts.

📌 Use Cases:

  • Domain Controllers

  • Cluster nodes (MS Cluster, Oracle RAC)

  • Redundant application servers

🧠 Example:

Two Active Directory Domain Controllers must not run on the same ESXi host.

If Host 1 fails:
✔ Only one DC affected
✔ Second DC remains available

This increases fault tolerance.


🖥️ C. VM–Host Affinity Rules

🔹 What It Does:

Pins specific VMs to specific hosts (or group of hosts).

📌 Use Cases:

  • Software licensing (Oracle per-socket licensing)

  • Regulatory compliance

  • Dedicated hardware (GPU workloads)

Example:

Database VM can only run on Host Group A (licensed CPUs).


⚙️ Rule Behavior Types

  • Must Rule → Strict enforcement (HA will respect rule)

  • Should Rule → Preferred but flexible (can violate if required for failover)

In enterprise design:
✔ Use MUST for compliance
✔ Use SHOULD for flexibility


🔹 2️⃣ Admission Control (HA Feature)

Admission Control ensures sufficient cluster capacity is reserved to tolerate host failures.

Without it:
Cluster may allow too many VMs → Failover may fail.


🎯 Purpose:

Guarantees resources for HA restart during host failure.


🔹 Admission Control Policies

1️⃣ Host Failures Cluster Tolerates

Example:
Cluster can tolerate 1 host failure.

HA reserves full capacity of one host.

Best for:

  • Equal-sized hosts

  • Simple environments


2️⃣ Percentage of Cluster Resources Reserved (Recommended)

Example:
Reserve 25% CPU & Memory.

More flexible than host-based.

Best for:
✔ Uneven host sizes
✔ Enterprise clusters


3️⃣ Dedicated Failover Hosts

One or more hosts kept idle.

Pros:
✔ Predictable failover

Cons:
❌ Expensive (unused hardware)

Rarely used in modern design.


🔹 How Admission Control Works (Technical View)

When powering on a VM:
HA calculates:

  • CPU reservation

  • Memory reservation

  • Current failover capacity

If insufficient:
❌ VM power-on blocked
Error:
“Insufficient resources to satisfy configured failover level”

This protects cluster SLA.


🔹 Real-World Enterprise Scenario

Scenario:

6-host cluster
Admission control set to tolerate 1 host failure.

If one host fails:
✔ HA restarts VMs
✔ Resources already reserved
✔ No capacity shortage

Without Admission Control:
❌ Overcommitment
❌ VM restart failure
❌ SLA breach


🔹 Interaction Between DRS Rules & HA

Important:

  • HA respects Must Rules

  • HA may violate Should Rules during emergency

  • DRS rebalances after failover

Design Tip:
Avoid too many strict anti-affinity rules in small clusters.


🔹 Enterprise Design Best Practices

✔ Minimum 3–4 hosts per cluster
✔ Use Percentage-based admission control
✔ Avoid overusing “Must” rules
✔ Monitor slot size if using slot-based policy
✔ Test failover scenarios periodically


🔹 Interview-Level Summary

Affinity → Keep VMs together
Anti-Affinity → Keep VMs apart
VM-Host Rules → Control VM location
Admission Control → Reserve capacity for failover

Affinity/Anti-affinity = Placement Control
Admission Control = Failover Protection


🔹 Final Thought

In enterprise environments, these features are not optional — they are critical for SLA compliance, licensing control, and workload resiliency.

Correctly configured:
✔ Prevents cluster risk
✔ Ensures compliance
✔ Guarantees failover capacity
✔ Improves operational stability


If you'd like, I can next provide:

  • Advanced enterprise rule design example (10+ host cluster)

  • Real troubleshooting case studies

  • Architecture decision comparison (small vs enterprise cluster)

  • Practical lab configuration steps

Advanced enterprise rule design example (10+ host cluster)

🔷 Advanced Enterprise Rule Design Example

For 10+ Host Cluster in VMware vSphere

This example reflects a production-grade enterprise environment such as a hospital, BFSI, or large enterprise data center running 200–500+ VMs.


🏗️ Scenario Overview

Cluster Design:

  • 12 ESXi Hosts

  • 2 CPU sockets per host

  • Shared SAN / vSAN storage

  • DRS: Fully Automated

  • HA: Enabled

  • Workloads:

    • Domain Controllers

    • Database Servers

    • Application Servers

    • Web Servers

    • Backup Servers

    • Licensing-restricted workloads (Oracle, GPU)

Goal:
✔ High availability
✔ Licensing compliance
✔ Balanced performance
✔ Controlled failover capacity


🔹 1️⃣ Cluster Segmentation Strategy (Logical Grouping)

Even inside a single cluster, we create logical separation using:

Host Groups

  • HostGroup-A (Hosts 1–4) → Licensed DB workloads

  • HostGroup-B (Hosts 5–10) → General workloads

  • HostGroup-C (Hosts 11–12) → GPU / Special workloads

VM Groups

  • DB-VMs

  • Web-VMs

  • App-VMs

  • DC-VMs

  • Oracle-VMs

  • Backup-VMs


🔹 2️⃣ Affinity & Anti-Affinity Rule Design


✅ A. Domain Controllers (Critical Design)

Rule:

DC1 and DC2 → MUST run on separate hosts
Type: VM–VM Anti-Affinity (Must)

Reason:
If one host fails → second DC remains available.


✅ B. Application Tier Separation

For 6 Application Servers:

Rule:
App1–App6 → Spread across different hosts (Should Anti-Affinity)

Reason:
Load distribution + redundancy.


✅ C. Web + App Tier Optimization

Rule:
Web1 & App1 → SHOULD run together (Affinity)

Reason:
Low latency between tiers.

But not MUST → allows HA flexibility.


✅ D. Database Licensing (Oracle Example)

Oracle licensed only on 4 hosts.

Rule:
Oracle-VMs → MUST run only on HostGroup-A
(Type: VM–Host Affinity)

Prevents:
❌ Accidental DRS migration to unlicensed host
❌ Licensing audit risk


✅ E. GPU Workloads

GPU VMs → MUST run on HostGroup-C.

If GPU host fails:
Only those 2 hosts handle GPU workloads.


🔹 3️⃣ Admission Control Configuration

For 12-host cluster:

Policy:
✔ Percentage-based (Recommended)

Example:
Reserve 20–25% CPU & Memory

Why?
12 hosts → tolerate 2 host failures safely.

Failover Capacity Calculation:
If 2 hosts fail → Remaining 10 hosts must support all workloads.


🔹 4️⃣ Advanced Enterprise Enhancements


🔹 Resource Pools

Create resource pools:

  • Tier-1 (Critical) → High shares

  • Tier-2 (Production) → Medium shares

  • Tier-3 (Dev/Test) → Low shares

Prevents dev workloads from starving production.


🔹 Proactive HA Integration

With hardware monitoring:
If Host 4 shows memory degradation:
✔ VMs evacuated automatically
✔ Host quarantined

No production crash.


🔹 Predictive DRS

If database load spike predicted:
✔ VMs migrated proactively
✔ Avoids performance degradation


🔹 5️⃣ Rule Balance Strategy (Very Important)

In large clusters:

❌ Too many MUST rules = HA restriction
❌ Too many strict anti-affinity rules = Restart failure risk

Best Practice:

  • Critical infrastructure → MUST rules

  • Performance tuning → SHOULD rules


🔹 6️⃣ Failure Simulation Example

Scenario 1: Host 3 Fails

  • HA restarts VMs

  • DRS rebalances cluster

  • Anti-affinity rules respected


Scenario 2: Two Hosts Fail

  • Admission control reserved capacity

  • Critical VMs restart

  • Lower-tier workloads may delay


Scenario 3: Oracle Host Failure

  • HA restarts Oracle VM

  • But only inside HostGroup-A

  • Licensing compliance maintained


🔹 7️⃣ Enterprise Best Practices Checklist

✔ Minimum 5–6 hosts for rule flexibility
✔ Avoid strict anti-affinity in small clusters
✔ Always test HA failover quarterly
✔ Monitor DRS migration frequency
✔ Enable EVC for CPU compatibility
✔ Separate vMotion & Management network
✔ Maintain consistent firmware & BIOS


🔹 8️⃣ Interview-Level Enterprise Answer

“In a 10+ host enterprise cluster, we logically segment workloads using VM groups and host groups. Critical systems use anti-affinity rules, licensed workloads use VM-host affinity, and admission control is configured using percentage-based reservation to tolerate at least one or two host failures. We avoid excessive strict rules to maintain HA flexibility.”


🔹 Final Enterprise Takeaway

In large clusters:

Affinity/Anti-Affinity = Risk Distribution
VM-Host Rules = Compliance Control
Admission Control = SLA Protection
DRS = Performance Optimization
HA = Fault Recovery

When designed correctly, a 10+ host cluster becomes:

✔ Self-healing
✔ SLA-compliant
✔ License-safe
✔ Performance-balanced
✔ Enterprise-ready




Comments

Popular posts from this blog

Quick Guide to VCF Automation for VCD Administrators

  Quick Guide to VCF Automation for VCD Administrators VMware Cloud Foundation 9 (VCF 9) has been  released  and with it comes brand new Cloud Management Platform –  VCF Automation (VCFA)  which supercedes both Aria Automation and VMware Cloud Director (VCD). This blog post is intended for those people that know VCD quite well and want to understand how is VCFA similar or different to help them quickly orient in the new direction. It should be emphasized that VCFA is a new solution and not just rebranding of an old one. However it reuses a lot of components from its predecessors. The provider part of VCFA called Tenenat Manager is based on VCD code and the UI and APIs will be familiar to VCD admins, while the tenant part inherist a lot from Aria Automation and especially for VCD end-users will look brand new. Deployment and Architecture VCFA is generaly deployed from VCF Operations Fleet Management (former Aria Suite LCM embeded in VCF Ops. Fleet Management...
  Issue with Aria Automation Custom form Multi Value Picker and Data Grid https://knowledge.broadcom.com/external/article?articleNumber=345960 Products VMware Aria Suite Issue/Introduction Symptoms: Getting  error " Expected Type String but was Object ", w hen trying to use Complex Types in MultiValue Picker on the Aria for Automation Custom Form. Environment VMware vRealize Automation 8.x Cause This issue has been identified where the problem appears when a single column Multi Value Picker or Data Grid is used. Resolution This is a known issue. There is a workaround.  Workaround: As a workaround, try adding one empty column in the Multivalue picker without filling the options. So we can add one more column without filling the value which will be hidden(there is a button in the designer page that will hide the column). This way the end user will receive the same view.  

Step-by-Step Explanation of Ballooning, Compression & Swapping in VMware

 🔹 Step-by-Step Explanation of Ballooning, Compression & Swapping in VMware ⸻ 1️⃣ Memory Ballooning (vmmemctl) Ballooning is the first memory reclamation technique used when ESXi detects memory pressure. ➤ Step-by-Step: How Ballooning Works  1. VMware Tools installs the balloon driver (vmmemctl) inside the guest OS.  2. ESXi detects low free memory on the host.  3. ESXi inflates the balloon in selected VMs.  4. Balloon driver occupies guest memory, making the OS think RAM is full.  5. Guest OS frees idle / unused pages (because it believes memory is needed).  6. ESXi reclaims those freed pages and makes them available to other VMs. Why Ballooning Happens?  • Host free memory is very low.  • ESXi wants the VM to release unused pages before resorting to swapping. Example  • Host memory: 64 GB  • VMs used: 62 GB  • Free: 2 GB → ESXi triggers ballooning  • VM1 (8 GB RAM): Balloon inflates to 2 GB → OS frees 2 GB → ESXi re...