Skip to main content

 why disk I/O matters?


disk I/O is seen as the speed at which data is read from or written to storage. poor disk performance directly affects response times, causing bottlenecks even in high-end systems.


as technology is scaling, we observe that slow response times aren't always due to high CPU usage or memory bottlenecks. often, the root cause is disk I/O latency.


Netflix during its early days of cloud migration to AWS, faced challenges with disk I/O performance, specifically the disk latency and IOPS weren't sufficient to handle peak streaming demands.


some of the challenges they encountered was:


πŸ”΄ I/O bottlenecks in the cloud - before moving to an SSD instance, limited disk throughput was a major hurdle for running I/O intensive apps in the cloud.


πŸ”΄ strained disk performance - Netflix's Cassandra on earlier EC2 types (m2.4xlarge) struggled with I/O limits.


πŸ”΄ reliance on caching - to keep up with read demand, they deployed a large Memcached tier and leveraged ample RAM on Cassandra nodes to cache data, reducing disk hits.


so, they took a benchmarking approach where they first ran low-level disk benchmarks on AWS's new hi.4xlarge SSD instance to measure new performance. with the test conclusion, the instance could sustain over 100000 IOPS and ~1GB/s throughput with very low latency. (20 to 60 microseconds)


key findings from the benchmark:


✅ SSD lived up to the hype and validated AWS's claims


✅ ample I/O Headroom


✅ bottleneck shifted to CPU since SSD provided so much I/O capacity that the cluster's performance was now constrained by processing power rather than disk speed


✅ dramatically lower latency


πŸ“˜ source:


1️⃣ (Benchmarking High Performance I/O with SSD for Cassandra on AWS) https://lnkd.in/dvdBmdtE


2️⃣ (JMeter Plugin for Cassandra) https://lnkd.in/dFYc9p4N


we still face disk I/O challenges today -- perhaps even more than before due to rapidly increasing data volumes and high user expectations.


modern disk I/O challenges include:


πŸ”΄ cloud variability - even with high-performance SSD-based storage, disk performance can fluctuate significantly due to shared multi-tenant environments


πŸ”΄ microservices and containerization - modern tech stack using microservices often multiply disk I/O operations


πŸ”΄ database and analytics workloads - heavy database operations (analytics with snowflake) can still create significant disk I/O bottlenecks


so don't stop monitoring, don't assume and stay curious...



Comments

Popular posts from this blog

Top 10 high-level EC2 scenario-based questions to challenge your AWS & DevOps skills

 Here are 10 high-level EC2 scenario-based questions to challenge your AWS & DevOps skills 1. Your EC2 instance is running but you can’t connect via SSH. What troubleshooting steps will you take?  Check Security Group inbound rules (port 22 open to your IP).  Verify Network ACLs (NACLs not blocking inbound/outbound).  Confirm instance’s Public IP / Elastic IP.  Validate Key Pair and correct permissions on .pem.  Ensure SSM Agent is installed (Session Manager can help).  Check system logs on the console for OS-level issues. 2. You terminated an EC2 instance by mistake. How can you prevent this in the future? Enable Termination Protection in EC2 settings. Use IAM permissions to restrict TerminateInstances. Tag critical instances and set resource policies. 3. Your EC2 instance needs to access an S3 bucket securely. What’s the best way to configure this? Best practice: Attach an IAM Role with least privilege policy to the EC2 instance. Avoid hardcoding...

GitOps-Driven Management of VKS Clusters: Enabling GitOps on VCF 9.0 (Part 03)

  GitOps-Driven Management of VKS Clusters: Enabling GitOps on VCF 9.0 (Part 03) In the Part-02 blog, we walked through the process of deploying an Argo CD instance within a vSphere Namespace on  VMware Cloud Foundation (VCF) 9.0 , enabling a GitOps-based approach to manage Kubernetes workloads in a vSphere environment. With Argo CD successfully installed, we now have a powerful toolset to drive declarative infrastructure and application delivery. In this blog post, we’ll take the next step by demonstrating how to  provision and manage VKS clusters  directly through the Argo CD  UI and CLI . This allows us to fully operationalise GitOps within the private cloud, delivering consistency, scalability, and automation across the Kubernetes lifecycle. Importance of Managing the Kubernetes Cluster with a Gitops Approach Adopting a GitOps-based approach for managing Kubernetes clusters enables declarative, version-controlled, and automated operations by leveraging Git a...
 https://knowledge.broadcom.com/external/article?articleNumber=389217 VMware Aria Suite Backup and Restore Documentation Issue/Introduction This article host backup and restore documentation for VMware Aria Suite 2019 product lines. Environment VMware Aria Suite 8.x VMware Aria Automation 8.x VMware Aria Automation Orchestrator 8.x Cause Technical documentation has been migrated from docs dot vmware dot com to  https://techdocs.broadcom.com . During this migration, some content considered End of Life (EOL) or End of General Support (EOGS) was not targeted for migration. Resolution PDF files are provided in this article while these documents are restored to https://techdocs.broadcom.com. Attachments Backup & Restore with EMC Avamar for VMware Aria Suite.pdf get_app Backup & Restore with Netbackup for VMware Aria Suite.pdf get_app VMware Aria Suite Backup and Restore by Using vSphere Data Protection.pdf get_app