Skip to main content

 why disk I/O matters?


disk I/O is seen as the speed at which data is read from or written to storage. poor disk performance directly affects response times, causing bottlenecks even in high-end systems.


as technology is scaling, we observe that slow response times aren't always due to high CPU usage or memory bottlenecks. often, the root cause is disk I/O latency.


Netflix during its early days of cloud migration to AWS, faced challenges with disk I/O performance, specifically the disk latency and IOPS weren't sufficient to handle peak streaming demands.


some of the challenges they encountered was:


πŸ”΄ I/O bottlenecks in the cloud - before moving to an SSD instance, limited disk throughput was a major hurdle for running I/O intensive apps in the cloud.


πŸ”΄ strained disk performance - Netflix's Cassandra on earlier EC2 types (m2.4xlarge) struggled with I/O limits.


πŸ”΄ reliance on caching - to keep up with read demand, they deployed a large Memcached tier and leveraged ample RAM on Cassandra nodes to cache data, reducing disk hits.


so, they took a benchmarking approach where they first ran low-level disk benchmarks on AWS's new hi.4xlarge SSD instance to measure new performance. with the test conclusion, the instance could sustain over 100000 IOPS and ~1GB/s throughput with very low latency. (20 to 60 microseconds)


key findings from the benchmark:


✅ SSD lived up to the hype and validated AWS's claims


✅ ample I/O Headroom


✅ bottleneck shifted to CPU since SSD provided so much I/O capacity that the cluster's performance was now constrained by processing power rather than disk speed


✅ dramatically lower latency


πŸ“˜ source:


1️⃣ (Benchmarking High Performance I/O with SSD for Cassandra on AWS) https://lnkd.in/dvdBmdtE


2️⃣ (JMeter Plugin for Cassandra) https://lnkd.in/dFYc9p4N


we still face disk I/O challenges today -- perhaps even more than before due to rapidly increasing data volumes and high user expectations.


modern disk I/O challenges include:


πŸ”΄ cloud variability - even with high-performance SSD-based storage, disk performance can fluctuate significantly due to shared multi-tenant environments


πŸ”΄ microservices and containerization - modern tech stack using microservices often multiply disk I/O operations


πŸ”΄ database and analytics workloads - heavy database operations (analytics with snowflake) can still create significant disk I/O bottlenecks


so don't stop monitoring, don't assume and stay curious...



Comments

Popular posts from this blog

Quick Guide to VCF Automation for VCD Administrators

  Quick Guide to VCF Automation for VCD Administrators VMware Cloud Foundation 9 (VCF 9) has been  released  and with it comes brand new Cloud Management Platform –  VCF Automation (VCFA)  which supercedes both Aria Automation and VMware Cloud Director (VCD). This blog post is intended for those people that know VCD quite well and want to understand how is VCFA similar or different to help them quickly orient in the new direction. It should be emphasized that VCFA is a new solution and not just rebranding of an old one. However it reuses a lot of components from its predecessors. The provider part of VCFA called Tenenat Manager is based on VCD code and the UI and APIs will be familiar to VCD admins, while the tenant part inherist a lot from Aria Automation and especially for VCD end-users will look brand new. Deployment and Architecture VCFA is generaly deployed from VCF Operations Fleet Management (former Aria Suite LCM embeded in VCF Ops. Fleet Management...
  Issue with Aria Automation Custom form Multi Value Picker and Data Grid https://knowledge.broadcom.com/external/article?articleNumber=345960 Products VMware Aria Suite Issue/Introduction Symptoms: Getting  error " Expected Type String but was Object ", w hen trying to use Complex Types in MultiValue Picker on the Aria for Automation Custom Form. Environment VMware vRealize Automation 8.x Cause This issue has been identified where the problem appears when a single column Multi Value Picker or Data Grid is used. Resolution This is a known issue. There is a workaround.  Workaround: As a workaround, try adding one empty column in the Multivalue picker without filling the options. So we can add one more column without filling the value which will be hidden(there is a button in the designer page that will hide the column). This way the end user will receive the same view.  

Step-by-Step Explanation of Ballooning, Compression & Swapping in VMware

 πŸ”Ή Step-by-Step Explanation of Ballooning, Compression & Swapping in VMware ⸻ 1️⃣ Memory Ballooning (vmmemctl) Ballooning is the first memory reclamation technique used when ESXi detects memory pressure. ➤ Step-by-Step: How Ballooning Works  1. VMware Tools installs the balloon driver (vmmemctl) inside the guest OS.  2. ESXi detects low free memory on the host.  3. ESXi inflates the balloon in selected VMs.  4. Balloon driver occupies guest memory, making the OS think RAM is full.  5. Guest OS frees idle / unused pages (because it believes memory is needed).  6. ESXi reclaims those freed pages and makes them available to other VMs. Why Ballooning Happens?  • Host free memory is very low.  • ESXi wants the VM to release unused pages before resorting to swapping. Example  • Host memory: 64 GB  • VMs used: 62 GB  • Free: 2 GB → ESXi triggers ballooning  • VM1 (8 GB RAM): Balloon inflates to 2 GB → OS frees 2 GB → ESXi re...