Let’s face it—cloud services are fantastic. They empower you to build scalable, resilient applications leveraging virtualized compute instances, Kubernetes, and cloud databases. But what happens when those apps start to feel sluggish, unresponsive, or underperforming for no obvious reason? You’ve checked CPU, memory, and disk I/O, and everything seems fine. So what’s the deal?
The hidden culprit could be network throttling by your cloud provider, impacting latency, errors, traffic, and saturation—the four golden signals of system reliability.
Cloud Providers and Network Throttling
Yes, it’s true: your cloud provider might be silently throttling your network traffic, especially when your Kubernetes cluster exceeds the allocated bandwidth. Major cloud providers like AWS, Azure, and Google Cloud Platform (GCP) impose bandwidth limits based on instance size, and exceeding these limits can degrade the golden signals, causing serious issues.
Why is this a problem?
Throttling can lead to dropped packets, increased latency, and unpredictable application behavior. Because the rules for when cloud providers throttle your network traffic are complex, vary by provider, and are often hard to follow, it’s difficult to anticipate and prevent these issues. This lack of transparency makes diagnosing and resolving performance problems a real challenge.
Let’s break it down with an example using AWS.
AWS Throttling: What You Need to Know
In AWS, network bandwidth throttling is often tied to the size of your instance. Take the r6g.2xlarge instance as an example. It has a baseline bandwidth of 2.5 Gbps and can burst up to 10 Gbps. But there’s a catch:
- Burst Credits: AWS allows you to temporarily exceed your baseline bandwidth by using burst credits, but once those run out, your network traffic is throttled back down.
- Per-Flow Limits: AWS also has a per-flow limit of 5 Gbps, which can impact applications with large I/O operations.
Other cloud providers have similar policies, each with their own unique twists. The complexity of these mechanisms and the lack of transparency can make diagnosing and solving network throttling problems a real headache. But don’t worry—there are ways to combat it.
How Throttling Wreaks Havoc on Your Application
Network throttling doesn’t just slow down your app. It can cause serious problems:
- Increased Latency: Dropped packets result in retries, adding 200ms or more to your response times.
- Application Timeouts: Services may fail while waiting for data, leading to errors.
- Sluggish Performance: The whole app feels slow, resulting in poor user experiences.
- Unpredictable Behavior: Mysterious, intermittent issues can arise, making debugging a nightmare.
- Scaling Misconfigurations: You may lower CPU or memory thresholds to force scaling, which temporarily fixes the issue but inflates your costs.
- Resource Waste: Over-provisioning instances to gain more bandwidth may work, but it drives up your cloud bill.
- Monitoring Blind Spots: Standard tools often miss throttling issues, leaving you unaware of the problem.
In short, throttling can be a silent killer for your applications.
How to Diagnose and Address Throttling
Let’s look at two common scenarios where network throttling can hit your app—and what you can do about it.
Scenario 1: Sustained High Utilization
If your node’s average network usage exceeds the baseline bandwidth for its instance size, throttling is likely. The fix?
- Upgrade Instance Size: Move to a larger instance with more bandwidth.
- Distribute Load: Spread the network load across more nodes to reduce the burden on individual instances.
Scenario 2: Microbursts Causing Throttling
Microbursts—short spikes in traffic that exceed burst ceilings—can cause throttling even when average network utilization is low. These bursts are especially tricky because cloud metrics often report averages over minutes, masking spikes that occur in seconds.
Where Clockwork Comes In: Diagnose and Fix Cloud Network Throttling
There are many tools that evaluate microservices performance for the Golden Signals. However, there is a big gap in the market for networking tools that mitigate or prevent performance issues that degrade those signals and the user experience.
That’s where Clockwork comes in. We specialize in quickly pinpointing and preventing issues like network throttling, especially those caused by microbursts and other hidden factors. In other words, Clockwork’s solution was built to:
- Reduce Mean-Time-to-Detection (MTTD) (before users notice issues).
- Reduce Mean-Time-to-Resolution (MTTR).
- Target high-quality, early latency signals to diagnose and prevent them from impacting your user experience.
Here’s How Our Tools Can Help You:
Clockwork’s Latency Sensei: High-Resolution Monitoring
To diagnose throttling, you need deep visibility into your network. Clockwork’s Latency Sensei provides:
- Microsecond Precision: Track latency, throughput, and packet loss at a microsecond level.
- Detailed Insights: Pinpoint exactly when and where throttling occurs, helping you get to the root of the issue.
Clockwork’s Packet Rocket: Intelligent Traffic Shaping
Once you’ve diagnosed the problem, you’ll need to resolve it. Clockwork’s Packet Rocket helps by managing network traffic intelligently to prevent microbursts.
- Traffic Shaping: Smooth out network traffic to avoid exceeding per-second throttling thresholds.
- Consistent Performance: Ensure a steady flow of network traffic, avoiding sharp peaks that trigger throttling.
Clockwork’s Bandwidth Slicer: Optimize Your Network Resources
Finally, optimize your network with Clockwork’s Bandwidth Slicer, which ensures your critical applications always have the bandwidth they need.
- Priority-Based Allocation: Assign more bandwidth to high-priority applications, while limiting less critical tasks.
- Quality of Service: Guarantee low-latency, high-performance for your top-tier services.
- Prevent Congestion: Manage bandwidth smartly to avoid congestion and performance degradation.
Don’t Wait—Check for Throttling or Prevent It Altogether
Reach out to Clockwork.io for a FREE CONSULTATION to explore how we might help you. Our solutions offer significant gains in network performance and cost efficiency.
Here are the types of improvements we’ve seen:
- Network performance boost: up to 2-3x improvement in network performance (depending on traffic shape and configuration).
- Cost reduction: Reduce cloud operational costs by 1/3 or more by avoiding unnecessary scaling and over-provisioning.
- Faster issue resolution: Experience a significantly reduced Mean Time to Resolution (MTTR) by 50% or more for network-related issues, improving uptime and reliability.
Clockwork’s software can be deployed across a wide variety of environments, including on-premises, cloud, edge, or hybrid networks. Compatible with bare metal, virtual machines, and containers, the solution integrates with all major cloud providers. Get up and running quickly with just a few lines of code—no access needed to the underlying cloud infrastructure.