The AWS Data Transfer Bill Nobody Warned You About
The AWS bill arrives. Compute is roughly what you expected. Storage is under budget. Then there is a line item labeled “Data Transfer” for $38,000 that nobody predicted, nobody owns, and nobody can immediately explain.
This is not an unusual story. Data transfer costs are consistently the most surprising charge on AWS bills — particularly for teams that have invested time optimizing EC2 instance types and RDS sizing but never sat down to map their data flows against AWS’s pricing model.
The pricing is not hidden. It is publicly documented. The problem is that it is genuinely complex — charges vary by direction, by whether traffic crosses an Availability Zone boundary, by whether it crosses a regional boundary, and by which AWS service is involved. Very few engineers have all of these rules in mind when designing an architecture.
This post maps the main cost sources, the most common architectural mistakes, and the specific optimizations that have the largest impact.
What AWS Actually Charges For
Internet Egress
Traffic leaving AWS to the public internet is the most expensive data transfer category. The rate is tiered: approximately $0.09 per GB for the first 10 TB per month, $0.085 per GB for the next 40 TB, decreasing at higher volumes. For many applications, this is the primary driver of data transfer costs.
Critically: traffic coming into AWS from the internet is free. Charges are directional. The asymmetry matters — applications with large upload volumes are not the primary data transfer risk; applications with large download volumes are.
Cross-Region Transfer
Data transferred between AWS regions — for example, from us-east-1 to eu-west-1 — is charged at approximately $0.02 per GB in each direction. This applies to direct region-to-region transfers between resources. The rate sounds modest but accumulates quickly in multi-region architectures with significant inter-region replication or API traffic.
Cross-AZ Transfer
This is the charge that surprises most teams. Data transferred between resources in different Availability Zones within the same region costs $0.01 per GB in each direction ($0.02 round-trip). The rate is lower than internet egress, but the volumes can be enormous — every database read from an application server in a different AZ, every load balancer routing request that crosses an AZ boundary, every cache miss that triggers a cross-AZ database fetch.
A high-throughput application doing 10 TB of cross-AZ database reads per month is paying approximately $200/month for that traffic pattern alone. Scale that to 100 TB and the charge is $2,000/month — just for traffic between AZs in the same region.
S3 Egress to EC2 and the Internet
S3 to EC2 within the same region is free — but only if the EC2 instance accesses S3 through a VPC endpoint. S3 traffic that routes via the public internet (through a NAT Gateway) is charged at internet egress rates. This distinction is important and commonly missed.
S3 to CloudFront is free. S3 to the internet (direct) is charged at internet egress rates. S3 to a different region is charged at cross-region rates.
NAT Gateway
NAT Gateway charges have two components: an hourly fee ($0.045/hour, approximately $32/month per NAT Gateway) and a data processing fee of $0.045 per GB processed. The data processing fee applies to all traffic routed through the NAT Gateway — both in and out. High-throughput applications routing significant volumes through NAT Gateway pay this fee on top of any applicable internet egress charges.
API Gateway Egress
API Gateway charges $0.09 per GB for data transferred out of API Gateway (to clients), in addition to the per-request fees. For APIs with large response payloads — returning significant JSON, binary data, or uncompressed content — this can be a meaningful cost driver.
CloudFront Origin Fetch
CloudFront charges for the data transfer from your origin (typically EC2 or S3) to CloudFront’s edge nodes. The rate is approximately $0.02 per GB for origins in the US and Europe, higher for other regions. Once data is cached at the edge, subsequent requests are served from the cache — the origin-to-edge charge only applies to cache misses.
The Common Architectural Mistakes
Cross-AZ Database Reads
The most common and most expensive pattern: application servers deployed across multiple Availability Zones for high availability, but the primary database is in a single AZ. Every read from an application server in AZ-b or AZ-c to the database in AZ-a incurs cross-AZ charges.
This pattern is pervasive. In an active multi-AZ deployment behind an Application Load Balancer, the ALB distributes traffic across all AZs. If the database is not in the same AZ as the application server handling the request, you pay cross-AZ charges on the database round-trip.
Cost impact: At 100 GB of cross-AZ database reads per day, the charge is approximately $2/day or $730/year. At 1 TB/day, the charge is approximately $20/day or $7,300/year. High-throughput transactional applications can easily exceed this.
Services in Wrong Regions
Teams frequently provision services in different regions for administrative convenience — a staging environment in us-west-2, production in us-east-1 — and then add integrations that send data between them. If the staging environment’s load tests generate traffic against a production API, or if a logging pipeline sends data from one region to a central logging account in another region, cross-region charges accumulate.
Multi-region architectures for resilience are justified. Multi-region architectures because someone clicked the wrong region when launching a resource are not.
S3 Access Without VPC Endpoints
EC2 instances in a private subnet accessing S3 must route through either a VPC endpoint or a NAT Gateway. Without a VPC endpoint, every S3 operation routes through the NAT Gateway and incurs the NAT Gateway data processing fee ($0.045/GB).
For applications with significant S3 traffic — object storage backends, artifact repositories, log delivery, ML model serving — the NAT Gateway fee on S3 traffic is entirely avoidable.
Fix: Create an S3 VPC Gateway Endpoint (free to create, no hourly charge) and configure route tables to route S3 traffic to the endpoint rather than the NAT Gateway.
Uncompressed Data at High Volume
API responses, log streams, and data pipeline outputs that are not compressed consume more data transfer capacity and pay proportionally higher data transfer charges. Gzip or Brotli compression on API responses can reduce payload sizes by 60–80% for typical JSON workloads. The CPU cost of compression is negligible compared to the data transfer savings at high volumes.
CloudFront Origin Misconfiguration
CloudFront deployed in front of an S3 bucket or EC2 origin should have caching configured correctly. An overly aggressive cache invalidation policy — or a configuration that effectively disables caching for frequently accessed content — means every request goes to the origin and every response incurs origin-to-edge data transfer charges, alongside the internet egress charges.
The worst-case CloudFront misconfiguration: all requests bypass the cache, all responses go to origin, all origin responses incur full internet egress pricing. This eliminates the cost benefit of CloudFront while adding operational complexity.
Ignoring Cross-Account Data Transfer
AWS organizations frequently have production, staging, development, and shared services accounts. Traffic between these accounts — even in the same region — is charged at cross-AZ or cross-region rates depending on where resources are located. Centralized logging architectures, shared build systems, and cross-account API calls all generate this traffic.
Practical Optimizations
1. VPC Gateway Endpoints for S3 and DynamoDB
VPC Gateway Endpoints for S3 and DynamoDB are free to create and route traffic directly between your VPC and the service without passing through a NAT Gateway. For any application with significant S3 or DynamoDB traffic from instances in private subnets, this is a zero-risk, zero-cost optimization.
Set up: create the endpoint in the VPC console, attach the appropriate route tables, verify that existing S3 bucket policies allow access from the VPC endpoint. The change is non-disruptive and takes under 30 minutes.
Expected savings: Eliminates NAT Gateway data processing charges ($0.045/GB) on all S3 and DynamoDB traffic. For an application processing 100 TB of S3 traffic per month through NAT Gateway, this saves approximately $4,500/month.
2. Co-locate Services in the Same AZ
For services with high read throughput between them — application servers and database, application servers and cache cluster — co-locating in the same AZ eliminates cross-AZ charges on that traffic. This trades some fault tolerance (a single AZ outage affects all traffic between those services) for cost reduction.
The practical approach: identify your highest-volume cross-AZ traffic pair, evaluate whether co-location is operationally acceptable for that specific pair, and deploy accordingly. This does not mean abandoning multi-AZ deployment for everything — it means being deliberate about which high-volume service pairs need to be co-located.
For read replicas in RDS, the read replica endpoint connects to a replica in a potentially different AZ. Configuring your application to use the instance endpoint for a specific AZ’s replica — rather than the reader endpoint that load-balances across AZs — guarantees same-AZ reads. The tradeoff is that if the selected AZ becomes unavailable, you lose read capacity until the application is reconfigured.
3. CloudFront for Egress Reduction
CloudFront’s internet egress pricing is lower than standard EC2/ALB internet egress pricing for high-volume customers, and CloudFront-to-internet transfer rates decrease at higher volumes. For applications with significant outbound traffic to end users, routing through CloudFront can reduce egress costs.
More importantly: cached content served from CloudFront does not incur origin-to-CloudFront transfer charges on cache hits. If 80% of your traffic is cacheable and you achieve 80% cache hit rates, you eliminate origin data transfer charges for 64% of your total traffic.
Evaluate CloudFront for: static assets (images, video, documents), API responses that have reasonable cache TTLs, large file downloads.
4. Compress API Responses
Enable response compression on API Gateway, ALB, or your application framework. The implementation is typically a single configuration change or a middleware addition. Gzip compression on JSON API responses typically achieves 5:1 to 8:1 compression ratios.
Expected savings: If your API Gateway delivers 50 TB of JSON responses per month, compression to 10 TB saves 40 TB of data transfer charges. At $0.09/GB, that is approximately $3,600/month in savings.
5. Right-Size Data Pipelines
Log aggregation, analytics pipelines, and data replication often have implicit assumptions baked in — every log line, every event, every database change replicated to a central store. Audit these pipelines for data volume and regional routing:
- Are logs being shipped cross-region? Can the analysis be done in the source region instead?
- Are all log lines valuable, or can sampling reduce volume by 80–90% for high-frequency events?
- Is database replication including binary log events that are not needed by downstream consumers?
Reducing the data volume in a pipeline by 70% reduces data transfer charges for that pipeline by 70%.
6. Use the AWS Pricing Calculator to Model Before You Build
The AWS Pricing Calculator allows you to model architecture data flows before deploying. For any new feature or service that involves significant data movement — bulk data processing, video streaming, ML inference, analytics ingestion — spend 30 minutes modeling the transfer costs before finalizing the architecture.
The common discovery: a design that seemed reasonable in terms of compute and storage looks very different when the data transfer costs are made explicit.
Finding Your Current Data Transfer Costs
AWS Cost Explorer breaks down data transfer costs by service and by transfer type. The path: Cost Explorer → Filter by Service → Data Transfer → group by Usage Type. The Usage Type dimension reveals the specific traffic types: “DataTransfer-Regional-Bytes” (cross-AZ), “DataTransfer-Out-Bytes” (internet egress), “DataTransfer-Out-Bytes-CloudFront” (CloudFront).
AWS Cost and Usage Reports (CUR) provide line-item granularity. If your data transfer costs are significant enough to investigate thoroughly, enable CUR export to S3 and query with Athena. The product_transfer_type column identifies the specific transfer type, and line_item_usage_amount gives the bytes. Joining against resource_id identifies which specific resources are generating the charges.
VPC Flow Logs provide the ground-truth view of network traffic between resources. For cross-AZ traffic analysis specifically, flow log records include the source and destination IP addresses — which you can cross-reference with your subnet-to-AZ mapping to identify high-volume cross-AZ flows.
The Bottom Line
Data transfer costs are not opaque — they are predictable from architecture, and they are reducible through architectural change. The teams that get surprised by large data transfer bills are typically those who optimized compute and storage in detail but never applied the same discipline to data flows.
The three changes with the highest return on investment for most architectures: VPC endpoints for S3 and DynamoDB (free, immediate), API response compression (low effort, immediate), and explicit AZ co-location of high-throughput service pairs (requires planning, high impact). Start there before investing in more complex optimization strategies.
Modeling data flows in the architecture design phase — before the bill arrives — is significantly cheaper than discovering the problem in production.
See AWS’s full data transfer pricing documentation for current rates before making cost projections.
