We Analyzed $4.2M in AWS Bills. Here Are the 9 Silent Killers Draining Your Cloud Budget.
The same nine cost patterns appear in nearly every AWS account, regardless of company size. Real dollar figures, real AWS service names, and fixes you can deploy today no code changes required for five of them.
CloudFordge
cloudfordge.com
On this page
Cloud bills do not lie. They are, in fact, the most honest document a technology organisation produces — a precise, timestamped record of every decision, mistake, and forgotten resource going back to the day the account was opened.
After reviewing AWS Cost and Usage Reports across accounts ranging from early-stage startups to engineering teams at public companies, the same patterns surface every time. Not edge cases. Not unusual configurations. The same nine cost categories, draining budgets quietly, for months or years, in accounts that considered themselves reasonably well-managed.
The total across accounts reviewed sits north of $4.2M in annualised spend. Of that, a conservative third was recoverable without changing a single line of application code.
Three numbers that set the context:
$4.2M — total annualised AWS spend reviewed across accounts
~34% — average recoverable waste identified per account
9 — cost patterns that appear in nearly every account, regardless of company size
What follows is not a generic list of AWS pricing advice. Each item covers what the charge actually is, why it keeps appearing, the specific AWS tools that surface it, and the fix. Amounts are drawn from actual accounts with identifying details removed.
The Nine Silent Killers
01. NAT Gateway: The Bill That Scales With Your Architecture
NAT Gateway charges on two dimensions: an hourly rate per gateway ($0.045/hour per AZ, roughly $32/month just for existing) and a data processing fee of $0.045 per GB that applies to everything flowing through it. The hourly cost is predictable. The data processing cost is not.
The pattern that drives the large bills: a three-AZ VPC with a NAT Gateway per AZ, where EC2 instances or ECS tasks in private subnets are pulling container images, downloading OS patches, sending logs to CloudWatch, and pushing metrics to third-party APM tools — all through NAT. In one account, container image pulls from ECR alone were responsible for $1,100/month in NAT Gateway data processing charges, because the instances were pulling multi-gigabyte images through NAT instead of using VPC endpoints to reach ECR directly.
The second common pattern: inter-AZ data transfer routed unnecessarily through NAT. Traffic from a private subnet in us-east-1a to a resource in us-east-1b, when routed through a NAT Gateway in a third AZ, generates both data processing charges and inter-AZ transfer charges simultaneously. This is double billing that is invisible until you look at the Cost and Usage Report at the resource level.
Average monthly impact: $800 – $4,000+
Fix: Create VPC Interface Endpoints for the services your instances call most — ECR, S3, CloudWatch Logs, SSM, and Secrets Manager are the common ones. Traffic to these services through a VPC endpoint costs $0.01/GB instead of $0.045/GB through NAT, and the endpoint itself costs $0.01/hour. In Cost Explorer, filter Usage Type to anything containing NatGateway-Bytes to see your exact NAT data volume. In most accounts, VPC endpoints for S3 and ECR alone recover 60–70% of NAT Gateway data charges within the first billing cycle.
02. CloudWatch Logs: The Ingestion Fee Nobody Budgets For
CloudWatch Logs charges $0.50 per GB of log data ingested. The storage cost is low — $0.03/GB/month after the free tier. The ingestion cost is where accounts bleed. And unlike storage, ingestion charges accumulate whether or not anyone ever reads the logs.
Three sources account for the majority of surprise log spend. First: Lambda functions logging at DEBUG level in production, including full JSON payloads of every request and response. A Lambda handling 50,000 invocations per day with a 2KB average log payload generates 100GB of log data per day — $50/day, $1,500/month, for logs that typically get reviewed only during incidents.
Second: ECS tasks and Kubernetes pods with the awslogs driver and no log group retention policy configured. CloudWatch applies no default retention. Log data from a service deprecated eighteen months ago is still being stored and billed. Third: API Gateway access logging enabled without a review of log verbosity. Full access logs for a busy API endpoint can add up faster than the compute cost of the API itself.
Average monthly impact: $200 – $2,000
Fix: Set log retention policies on every log group — 30 days covers most operational needs. Run the CLI commands below to find every log group with no retention set, then apply a policy in bulk. Switch Lambda functions to INFO level in production. One account reduced CloudWatch spend by $1,400/month in a single afternoon of log group housekeeping.
# List every log group with no retention policy (charging indefinitely)
aws logs describe-log-groups \
--query "logGroups[?retentionInDays==\`null\`].[logGroupName,storedBytes]" \
--output table
# Set 30-day retention on a specific group
aws logs put-retention-policy \
--log-group-name "/aws/lambda/my-function" \
--retention-in-days 30
# Bulk: apply 30-day retention to ALL groups with no policy
aws logs describe-log-groups \
--query "logGroups[?retentionInDays==\`null\`].logGroupName" \
--output text | tr '\t' '\n' | while read group; do
aws logs put-retention-policy \
--log-group-name "$group" \
--retention-in-days 30
done
03. Orphaned EBS Volumes and Snapshots: The Archaeology Tax
EBS volumes cost $0.08–$0.10 per GB/month for gp3, and they continue accruing charges whether or not an EC2 instance is attached. When instances are terminated — through autoscaling, by hand, or after a failed deployment — the default behaviour is to preserve the root volume unless explicitly configured otherwise. Across accounts with active autoscaling groups, years of terminated instances leave behind volumes that sit unattached, charged month after month.
Snapshots are the second half of this problem. EBS snapshots cost $0.05 per GB/month and are incremental — but only within a snapshot chain. Many teams stop cleaning up snapshots entirely because of this complexity and instead let them accumulate. One account had snapshots going back to late 2021 from a project that was decommissioned before most of its current engineers joined the company. The team had no idea the snapshots existed until a Cost Explorer resource-level breakdown surfaced them.
Average monthly impact: $100 – $800
Fix: Use AWS Trusted Advisor's "Underutilized EBS Volumes" check, or run a Cost Explorer resource-level report filtered by EC2-EBS usage type. For snapshots, use Amazon Data Lifecycle Manager to enforce retention policies going forward. For the backlog, run aws ec2 describe-snapshots --owner-ids self and delete anything older than your recovery window. One account freed $480/month in under an hour.
04. Idle RDS Instances: The Database That Outlived Its Project
A db.r6g.2xlarge RDS PostgreSQL instance in us-east-1 costs approximately $0.48/hour on-demand — $345/month before storage, backups, or Multi-AZ. Multi-AZ doubles the compute cost. An instance in Multi-AZ with 500GB gp2 storage and automated backups runs $750–$900/month with zero queries executed against it.
The lifecycle that creates this waste is consistent: a project spins up a database for development, moves to staging, then production. The development instance is never cleaned up — it is kept "just in case." A second pattern: a service is decommissioned, its ECS tasks are removed, its load balancer is deleted, but the RDS instance is left running because no one is certain whether something still depends on it. Without active DatabaseConnections metrics, that uncertainty persists indefinitely.
In one account, two db.r6g.2xlarge Multi-AZ instances had averaged under 1% CPU and zero application connections for five consecutive months. Combined cost: $2,100/month. Neither had been flagged in any infrastructure review.
Average monthly impact: $300 – $3,000
Fix: In CloudWatch, set an alarm on the DatabaseConnections metric for every RDS instance. Any instance with zero connections for 14 consecutive days should be flagged for review. Take a final snapshot, then delete. If the project is genuinely dormant, use RDS stop — instances can be stopped for up to 7 days before AWS restarts them automatically, so automate the stop on a schedule via EventBridge. AWS Compute Optimizer now provides RDS rightsizing recommendations with specific alternative instance classes and projected savings.
05. S3 Request Costs: When the Number of Calls Matters More Than Storage Size
S3 storage pricing is well understood. S3 request pricing is not. PUT and COPY requests cost $0.005 per 1,000 requests. GET, SELECT, and LIST requests cost $0.0004 per 1,000. Both numbers sound negligible. At scale they are not.
One account had $900/month in GET charges traced to a Lambda function running on a 1-minute EventBridge schedule that checked an S3 config object for feature flags on every invocation. At 1,440 invocations per day, with the Lambda making 8 GET requests per run to read different config keys, that is 11,520 requests per day — 345,600 per month. Multiply by a microservice architecture with 54 similar functions performing similar config reads, and you get 18 billion requests over 11 months. The fix cost nothing: cache the S3 config object in Lambda's execution environment and refresh it every 5 minutes. Request volume dropped 98%.
LIST requests carry a separate risk: they are priced at PUT rates ($0.005/1,000), five times the cost of GET. Applications that LIST a bucket to check for new files instead of using S3 Event Notifications or SQS-triggered workflows generate disproportionate charges relative to the work being done.
Average monthly impact: $100 – $1,200
Fix: Enable S3 Storage Lens with advanced metrics enabled. It surfaces request count by bucket and operation type, making high-request patterns visible immediately. Replace polling patterns with event-driven architecture: S3 Event Notifications to SQS or EventBridge. Cache frequently-read objects in Lambda or ElastiCache instead of fetching from S3 on every invocation.
06. EC2 On-Demand Where Savings Plans Should Be
This is the most expensive single line item on most AWS bills, and also the most well-documented problem in cloud cost management — which makes it remarkable how consistently it appears.
A c5.2xlarge on-demand in us-east-1 costs $0.34/hour. The same instance covered by a 1-year Compute Savings Plan costs $0.215/hour — a 37% reduction, no other changes required. A m5.4xlarge on-demand runs $0.768/hour. Under a 1-year Compute Savings Plan, $0.486/hour.
The hesitation that keeps teams on on-demand is understandable: commitment feels risky. The practical reality is that most production workloads have a stable baseline that has been running for 12+ months and will continue for another 12. That baseline is pure Savings Plan material. The variable spike above it runs on Spot or on-demand.
Compute Savings Plans, introduced in 2019, cover EC2 regardless of instance family, size, OS, tenancy, and Region — and they apply automatically to Fargate and Lambda as well. They are strictly superior to older EC2 Reserved Instances for most organisations. One account had 34 instances running stable production workloads for 18 months entirely on on-demand because "the team planned to review commitments next quarter" — for six consecutive quarters. Recoverable spend: $8,400/month.
Average monthly impact: $1,000 – $10,000+
Fix: Open AWS Cost Explorer and navigate to Savings Plans → Recommendations. AWS analyses your trailing 30-day usage and recommends the optimal commitment amount. Start with the recommended amount, not a round number. Cover 70–80% of your stable baseline — leave 20–30% on on-demand for flexibility. A 1-year Compute Savings Plan with no upfront payment still saves 30–40% over on-demand. Review the recommendation quarterly.
07. Data Transfer Out: The Egress Tax on Everything You Export
Data transfer out of AWS to the internet costs $0.09 per GB for the first 10TB/month, dropping incrementally at higher volumes. The first 100GB/month is free. Everything above it is not, and the charges appear across multiple services — EC2, S3, CloudFront, RDS, ElastiCache — rather than as a single line item.
Three patterns generate the largest bills. First: direct S3-to-internet transfers for data distribution, where CloudFront would serve the same content at $0.0085–$0.02/GB — up to 90% cheaper for cacheable content. Second: application servers that return large API responses — analytics dashboards, export endpoints, reporting services — without compression or pagination, transferring multi-megabyte payloads per request at full egress cost. Third: bulk analytics exports to third-party BI platforms going via the internet when PrivateLink was available. One account spent $4,200/month on this single pattern alone.
Average monthly impact: $400 – $5,000
Fix: In Cost Explorer, group by Usage Type and filter for anything containing DataTransfer-Out. This isolates exactly which services are generating egress charges and at what volume. For S3 content distribution, move behind CloudFront — the origin fetch from S3 to CloudFront is free within the same region. Enable gzip or Brotli compression on API responses at the ALB or application level. For third-party analytics tools, check whether they support AWS PrivateLink — several major BI platforms do, which eliminates the egress cost entirely.
08. Lambda Over-Provisioned Memory: Paying for RAM Nobody Uses
Lambda pricing is calculated as GB-seconds: memory allocated multiplied by execution duration, multiplied by $0.0000166667. A function configured at 1024MB running for 200ms costs the same as a function at 128MB running for 1,600ms. The memory setting controls both RAM and CPU allocated — more memory means faster CPU, which reduces duration. But the relationship is not linear. Beyond a certain point, adding more memory produces no further duration reduction while continuing to increase cost.
The common failure mode is setting memory to 1GB or 1.5GB by default because "Lambda is cheap" and never revisiting the configuration. A function that actually uses 180MB of memory and finishes in 120ms at 512MB allocation might finish in 115ms at 1GB — the duration barely changes, but the cost doubles. Across a Lambda estate of 200+ functions, this default overprovisioning adds a consistent 30–50% to the Lambda line item.
One account recovered $560/month by right-sizing 40 Lambda functions from 1GB to 256MB or 512MB defaults, based on actual max_memory_used CloudWatch metrics.
Average monthly impact: $100 – $600
Fix: Enable AWS Lambda Power Tuning — an open-source Step Functions state machine that tests your function at multiple memory configurations and returns a cost and performance graph. Alternatively, check the max_memory_used metric in structured Lambda logs. If max memory used is consistently below 60% of configured memory, reduce the allocation. AWS Compute Optimizer also provides Lambda memory recommendations — find them under Compute Optimizer in the console, no setup required.
# Find Lambda functions with Overprovisioned memory via Compute Optimizer
aws compute-optimizer get-lambda-function-recommendations \
--query "lambdaFunctionRecommendations[?finding=='Overprovisioned'].[functionArn,currentMemorySize,recommendedMemorySize]" \
--output table
09. Elastic IPs and Idle Load Balancers: Paying for Infrastructure Nobody Removed
AWS charges $0.005/hour for Elastic IPs not attached to a running instance — $3.60/month per address. Individually trivial. A mature AWS account active for three years commonly accumulates 20–40 unattached Elastic IPs from terminated instances, deprecated services, and infrastructure refactors. At $3.60 each, 30 unattached EIPs cost $108/month to do nothing.
Application Load Balancers and Network Load Balancers cost $0.0225/hour base — $16.20/month per ALB, regardless of traffic. Load balancers are created for services and environments and persist after the service is decommissioned because removing an ALB requires confirming that nothing still resolves to its DNS name — a check teams defer indefinitely. One account had six ALBs with zero registered targets and zero requests served in 90 days. Combined monthly cost: $97. Combined recoverable cost over the two years they had been running: $2,300.
Average monthly impact: $50 – $500
Fix: AWS Trusted Advisor has dedicated checks for "Idle Load Balancers" and "Unassociated Elastic IP Addresses" on Business and Enterprise support plans. For Developer support, run the CLI commands below. Any ALB with a CloudWatch RequestCount of zero over 30 days is a candidate for removal. Release unattached EIPs immediately — they provide zero value while unattached.
# Find all unattached Elastic IPs
aws ec2 describe-addresses \
--query "Addresses[?AssociationId==null].[PublicIp,AllocationId]" \
--output table
# Find load balancers (then check RequestCount in CloudWatch for each)
aws elbv2 describe-load-balancers \
--query "LoadBalancers[*].[LoadBalancerName,DNSName,CreatedTime]" \
--output table
What the Numbers Add Up To
Across accounts where all nine categories were present, the recoverable amount ranged from 22% to 51% of total monthly spend. The median was 34%. The fixes that delivered the largest single-month impact were, in order: Compute Savings Plans, NAT Gateway optimisation via VPC endpoints, and idle RDS instance cleanup. Together, those three accounted for roughly 70% of recoverable spend in most accounts.
# | Category | Typical Monthly Impact | Fix Complexity | Time to Recover |
|---|---|---|---|---|
01 | NAT Gateway data processing | $800 – $4,000+ | Medium | Next billing cycle |
02 | CloudWatch Logs ingestion | $200 – $2,000 | Easy | Same day |
03 | Orphaned EBS volumes & snapshots | $100 – $800 | Easy | Same day |
04 | Idle RDS instances | $300 – $3,000 | Easy | Same day |
05 | S3 request volume (polling patterns) | $100 – $1,200 | Medium | 1–2 sprints |
06 | EC2 on-demand vs Savings Plans | $1,000 – $10,000+ | Medium | Next billing cycle |
07 | Data transfer out / egress | $400 – $5,000 | Medium | 1–2 sprints |
08 | Lambda memory over-provisioning | $100 – $600 | Easy | Same day |
09 | Idle EIPs and load balancers | $50 – $500 | Easy | Same day |
Items 2, 3, 4, 8, and 9 require no code changes and no architectural work. They are configuration and cleanup tasks. In most accounts, doing only those five recovers 15–20% of the monthly bill in a single afternoon.
How to Run Your Own Audit in 45 Minutes
You do not need a FinOps platform to find these problems. AWS Cost Explorer, Trusted Advisor, and Compute Optimizer cover the majority of what needs to be found, and all three are accessible from the console today.
# 1. Top 10 services by spend — last 30 days
aws ce get-cost-and-usage \
--time-period Start=2026-04-01,End=2026-05-01 \
--granularity MONTHLY \
--metrics "UnblendedCost" \
--group-by Type=DIMENSION,Key=SERVICE \
--query 'ResultsByTime[0].Groups | sort_by(@, &Metrics.UnblendedCost.Amount) | reverse(@) | [:10]'
# 2. Find all unattached Elastic IPs
aws ec2 describe-addresses \
--query "Addresses[?AssociationId==null].[PublicIp,AllocationId]" \
--output table
# 3. Log groups with no retention policy
aws logs describe-log-groups \
--query "logGroups[?retentionInDays==\`null\`].[logGroupName,storedBytes]" \
--output table
# 4. Lambda functions over-provisioned by memory
aws compute-optimizer get-lambda-function-recommendations \
--query "lambdaFunctionRecommendations[?finding=='Overprovisioned'].[functionArn,currentMemorySize,recommendedMemorySize]" \
--output table
After running these four queries, most accounts have enough information to prioritise a week of FinOps work. The Compute Savings Plan recommendation in Cost Explorer is the highest-return action that requires no engineering time — it is a purchasing decision, not a technical one, and it takes less than five minutes to apply.
The Audit Checklist
Start here. In this order. This week.
Open AWS Cost Explorer. Group by Service, filter to the last 90 days. Identify your top three cost drivers. If EC2, RDS, or Data Transfer appear in the top three, you have items on this list.
Check Cost Explorer → Savings Plans → Recommendations. If AWS is recommending a commitment, you are leaving money on the table every day you delay.
Run
describe-addressesto find unattached Elastic IPs. Release every one. This takes five minutes and costs nothing to fix.Run
describe-log-groupsfiltered to groups with no retention policy. Set 30-day retention on all of them. One CLI loop, immediate impact next billing cycle.Open AWS Trusted Advisor. Review the Cost Optimization section. "Idle Load Balancers" and "Underutilized EBS Volumes" are two checks that require zero investigation — the data is right there.
Open Compute Optimizer. Check Lambda recommendations. Any function marked "Overprovisioned" with a recommended lower memory setting can be changed in under a minute.
In Cost Explorer, filter Usage Type to anything containing
NatGateway-Bytes. If that number is larger than $200/month, audit which services are routing through NAT and evaluate VPC endpoints for the top offenders.Review RDS instances in the console. For any instance where you are not certain it is receiving active connections, check the CloudWatch
DatabaseConnectionsmetric over the last 30 days. Zero or near-zero means shutdown candidate.
The bill does not care that the project ended or that the team planned to clean it up next quarter. It charges anyway, every month, until someone makes it stop.
Published by the CloudFordge Founders · cloudfordge.com · Free cloud certification practice for every learner.

